From fyang at openjdk.org Tue Apr 1 01:42:15 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Apr 2025 01:42:15 GMT Subject: RFR: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 03:16:37 GMT, Feilong Jiang wrote: >> Hi, please review this trivial change fixing a client build issue. >> The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. > > Marked as reviewed by fjiang (Committer). @feilongjiang @robehn : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24307#issuecomment-2767815574 From fyang at openjdk.org Tue Apr 1 01:42:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Apr 2025 01:42:16 GMT Subject: Integrated: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:01:17 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a client build issue. > The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. This pull request has now been integrated. Changeset: 860a789e Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/860a789e9153448345f19d70dd07e294a0b62223 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod 8353219: RISC-V: Fix client builds after JDK-8345298 Reviewed-by: fjiang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24307 From qamai at openjdk.org Tue Apr 1 02:17:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 1 Apr 2025 02:17:14 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 22:28:49 GMT, Vladimir Ivanov wrote: >> Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing import > > Thanks. > >> The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. > > There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): > * `calc_` is redundant and IMO only adds noise; > * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. > > So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) > >> `addnodeXorUtil.hpp` > > I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. @iwanowww > `_non_neg` part is confusing; I'd stress instead that it works on ranges. I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767875213 From duke at openjdk.org Tue Apr 1 02:28:15 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:28:15 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v47] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/94a32dba..59875d54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=45-46 Stats: 96 lines in 4 files changed: 47 ins; 41 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Apr 1 02:44:03 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:44:03 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v48] In-Reply-To: References: Message-ID: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: remove unused methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/59875d54..50d35dcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=46-47 Stats: 12 lines in 2 files changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Apr 1 02:52:17 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:52:17 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 02:14:35 GMT, Quan Anh Mai wrote: >> Thanks. >> >>> The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. >> >> There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): >> * `calc_` is redundant and IMO only adds noise; >> * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. >> >> So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) >> >>> `addnodeXorUtil.hpp` >> >> I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. > > @iwanowww > >> `_non_neg` part is confusing; I'd stress instead that it works on ranges. > > I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767917005 From duke at openjdk.org Tue Apr 1 04:33:34 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 1 Apr 2025 04:33:34 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v3] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv match rule and add test Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: RISC-V: C2: Support Zvbb Vector And-Not instruction fix match rule for format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/7fc67099..a15d58dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From galder at openjdk.org Tue Apr 1 04:56:44 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 1 Apr 2025 04:56:44 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Changes requested by galder (Author). test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: > 116: > 117: @DontInline > 118: public CrashesNoInline() throws Throwable { It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2731106771 PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2022140499 From hgreule at openjdk.org Tue Apr 1 06:27:49 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 1 Apr 2025 06:27:49 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call Message-ID: Hi, this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. Please let me know what you think. ------------- Commit messages: - Call AddNode::Ideal in Or(I|L)Node::Ideal - Test AddNode::Ideal optimizations for Or(I|L) Changes: https://git.openjdk.org/jdk/pull/24348/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353359 Stats: 37 lines in 3 files changed: 33 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24348/head:pull/24348 PR: https://git.openjdk.org/jdk/pull/24348 From epeter at openjdk.org Tue Apr 1 07:06:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:06:32 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - upate copyright - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/d46c45de..4ca42699 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=04-05 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Tue Apr 1 07:06:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:06:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:04:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Verify.Options refactor for Galder >> - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - clean up test >> - JDK-8352869 > > Nice extensions! Some initial comments. @chhagedorn Thanks for the suggestions and questions! I think I addressed them all :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 25: > >> 23: >> 24: package compiler.lib.verify; >> 25: > > You should update the copyright year. done :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: > >> 207: print(a, b, field, aParent, bParent); >> 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); >> 209: } > > What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. Good question. I think my reasoning was that comparing arbitrary classes requires reflection. And that is rather slow. So by default it would be good if that feature is not enabled, so the user tries to avoid it, and is aware when they enable it explicitly. But if you think that is not useful, I can remove the feature. @chhagedorn what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2768381692 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022262263 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022265798 From epeter at openjdk.org Tue Apr 1 07:08:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:08:27 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Thanks for the updates, looks good to me now :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2731657942 From epeter at openjdk.org Tue Apr 1 07:12:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:12:21 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:38:21 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3096: > 3094: // paths. The dead paths are then replaced by a Halt node. > 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { > 3096: Unique_Node_List wq; Should there be a `ResourceMark` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022275763 From epeter at openjdk.org Tue Apr 1 07:18:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:18:45 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: Message-ID: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/77079807..be1c0ee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From dskantz at openjdk.org Tue Apr 1 07:28:22 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 1 Apr 2025 07:28:22 GMT Subject: RFR: 8282053: IGV: refine schedule approximation Message-ID: This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/24350/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8282053 Stats: 21 lines in 1 file changed: 20 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24350/head:pull/24350 PR: https://git.openjdk.org/jdk/pull/24350 From roland at openjdk.org Tue Apr 1 07:31:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 07:31:12 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - review - Merge branch 'master' into JDK-8341976 - review - review - Merge branch 'master' into JDK-8341976 - -XX:+TraceLoopOpts fix - review - more - Merge branch 'master' into JDK-8341976 - more - ... and 6 more: https://git.openjdk.org/jdk/compare/47f2dbd6...9b21648d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/9f79e0b0..9b21648d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=05-06 Stats: 8742 lines in 156 files changed: 4824 ins; 3469 del; 449 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Tue Apr 1 07:32:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 07:32:47 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> References: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> Message-ID: On Mon, 31 Mar 2025 12:26:42 GMT, Christian Hagedorn wrote: >> Right. So maybe, we could treat that `Opaque` node the way we do for `OpaqueZeroTripGuard` and have it constant fold when the backedge is never taken. >> >> So I should revert the change to the `IdealLoopTree::dump_head()` and the test run with `TraceLoopOpts`? > >> So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. > > Right, that sounds like a good solution. > >> So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? > > Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2022305281 From rcastanedalo at openjdk.org Tue Apr 1 07:44:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 07:44:27 GMT Subject: RFR: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Thanks for working on this, Daniel. Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2731778708 From shade at openjdk.org Tue Apr 1 07:58:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Apr 2025 07:58:27 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24301/files - new: https://git.openjdk.org/jdk/pull/24301/files/47f239c2..527854ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=00-01 Stats: 12 lines in 2 files changed: 0 ins; 11 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24301.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24301/head:pull/24301 PR: https://git.openjdk.org/jdk/pull/24301 From shade at openjdk.org Tue Apr 1 07:58:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Apr 2025 07:58:27 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: <5WuuW8GQhWOxXqYgEsVG0DZAjsu8DTjOdJZKWaae7vU=.be96f09d-9e3e-4472-94d1-3d92b487eb33@github.com> On Mon, 31 Mar 2025 21:14:45 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/x86/c1_FrameMap_x86.cpp line 45: > >> 43: Register reg = r_1->as_Register(); >> 44: if (r_2->is_Register() && (type == T_LONG || type == T_DOUBLE)) { >> 45: Register reg2 = r_2->as_Register(); > > FTR `reg2` is unused. (Moreover, `r_2` and `r_2->is_Register()` are redundant on x64.) Right. Cleaned those up too. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 827: > >> 825: // compressed klass ptrs: T_METADATA can be a compressed klass >> 826: // ptr or a 64 bit method pointer. >> 827: ShouldNotReachHere(); > > Alternatively, you could drop the whole `T_METADATA` case and defer the handling to default case. I initially thought leaving the comment there as meaningful, but now I think that comment only relates to 32-bit x86, so now is redundant. So I dropped the `T_METADATA` case completely. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3063: > >> 3061: ExternalAddress((address)double_signflip_pool), >> 3062: rscratch1); >> 3063: > > Is it intentional or just a leftover? Merge leftover, removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022343408 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022344553 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022344878 From roland at openjdk.org Tue Apr 1 08:06:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 08:06:09 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> On Tue, 1 Apr 2025 07:09:46 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/node.cpp line 3096: > >> 3094: // paths. The dead paths are then replaced by a Halt node. >> 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { >> 3096: Unique_Node_List wq; > > Should there be a `ResourceMark` here? The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022357322 From epeter at openjdk.org Tue Apr 1 08:25:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 08:25:21 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Looks good to me now, thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731884935 From chagedorn at openjdk.org Tue Apr 1 08:30:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:30:34 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731900566 From chagedorn at openjdk.org Tue Apr 1 08:35:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:35:27 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/c777fe68...9b21648d Update looks good thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2731911587 From chagedorn at openjdk.org Tue Apr 1 08:35:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:35:27 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> Message-ID: On Tue, 1 Apr 2025 07:30:47 GMT, Roland Westrelin wrote: >>> So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. >> >> Right, that sounds like a good solution. >> >>> So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? >> >> Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. > > Done. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2022403189 From chagedorn at openjdk.org Tue Apr 1 08:53:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:53:32 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> References: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> Message-ID: On Tue, 1 Apr 2025 08:03:51 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 3096: >> >>> 3094: // paths. The dead paths are then replaced by a Halt node. >>> 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { >>> 3096: Unique_Node_List wq; >> >> Should there be a `ResourceMark` here? > > The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. I assume that JDK-8275202 also calls this method with a non-null `PhaseIdealLoop` pointer? Now we only pass in null, so the `loop` parameter could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022418989 From chagedorn at openjdk.org Tue Apr 1 08:53:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:53:33 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:38:21 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3155: > 3153: > 3154: > 3155: Suggestion: src/hotspot/share/opto/phaseX.cpp line 1836: > 1834: _type_nodes.push(n); > 1835: } > 1836: const Type* new_type = n->Value(this); Could we also only add `n` to `_type_nodes` if `new_type` is not top? Then we could also rename `_type_nodes` to `_maybe_top_type_nodes` or something like that. test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java line 31: > 29: * -XX:CompileCommand=dontinline,TestGuardOfCastIIDoesntFold::notInlined > 30: * TestGuardOfCastIIDoesntFold > 31: * @run main/othervm TestGuardOfCastIIDoesntFold You can use `main` since you don't pass any flags: Suggestion: * @run main TestGuardOfCastIIDoesntFold ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022428891 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022428263 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022422961 From chagedorn at openjdk.org Tue Apr 1 09:14:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 09:14:43 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:06:32 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - upate copyright > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn I'll have a closer look at the code later again :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2732020875 From chagedorn at openjdk.org Tue Apr 1 09:14:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 09:14:43 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:02:11 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: >> >>> 207: print(a, b, field, aParent, bParent); >>> 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); >>> 209: } >> >> What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. > > Good question. I think my reasoning was that comparing arbitrary classes requires reflection. And that is rather slow. So by default it would be good if that feature is not enabled, so the user tries to avoid it, and is aware when they enable it explicitly. > > But if you think that is not useful, I can remove the feature. > > @chhagedorn what do you think? I think the intention to let the user double check is good. I'm not sure though if the user is really aware of the potential slow down without diving deeper into the implementation. All they know is that `checkEQ` somehow does not support their some objects but there is a simple workaround to still use it. So, the real question is: How many users will then consider doing something different when facing this exception and not just enable it anyway? I guess enabling is probably the most natural thing to do. Given that, I would probably just drop this. It would also simplify the API usage in the following way: We would only have checks with NaNs being all equals and comparing raw bits (i.e. NaNs not equal). Then you could offer `checkEQ()` (default) and `checkRawBitsEQ()` or something like that. Then users do not need to worry about creating and passing in an `Options`. What do you think about these suggestions? What we could do either way at the `checkEQ()` API method: Describe the potential slow down with reflection when not using certain classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022461294 From chagedorn at openjdk.org Tue Apr 1 10:09:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 10:09:15 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 16:08:57 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly check for OP_Con instead of TypeInteger::is_con. > > 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) > > While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. Thanks Matthias for having a look at the issue and proposing a fix! While this fix seems to work, I think we should address it slightly differently with an explicit bailout, though. Let's step back a bit: CCP first sets all types to top and then tries to widen them (i.e. an optimistic approach) while IGVN does the opposite: We start by setting all types to bottom and then try to narrow them (i.e. a pessimistic approach). The assert we've faced in CCP complains that we tried to narrow some type again which is against the rules of CCP - we can only widen types. Now when CCP runs, we start with every type of every node at top. When visiting `AndI` at some point, we see what you reported above: > What I observe for the Integer.parseInt reproducer is that expr dumps as a phi node with type #int:-256...127, but phase->type(expr) returns a type that is_con() with value -256. That is perfectly fine. What happened here is that only one input of the phi with type `#int:-256` is non-top. The other inputs are still top (i.e. not processed in CCP, yet). Therefore, the phi's type is set to `#int:-256`. Note that the `TypeNode::_type` field of the phi is still set to the type we had before CCP, i.e. ` #int:-256...127` . In CCP, we use `PhaseValues::_types` which are set to top in the beginning and we leave `TypeNode::_type` unchanged during the analysis. As a consequence this can happen when having a phi and only looking at the currently tracked CCP types: > In consequence, the AND(phi-node, mask) gets optimized to zero. Let's look at the output of the failure: 304 ConI === 0 [[ 506 ]] #int:255 996 CastII === 461 453 [[ 557 546 535 524 1034 506 ]] #int:-256..127 extra types: {0:int:-256} strong dependency !orig=[478] !jvms: Integer::parseInt @ bci:144 (line 550) 506 AndI === _ 996 304 [[ 507 ]] !jvms: Integer::parseInt @ bci:170 (line 552) told = int:0 tnew = top it looks like we first optimized `AndI` to zero (i.e. `told`) and then set it to top again in a later `Value()` call in CCP (i.e. `tnew`). This is a violation of the rules for CCP. When we suddenly see top again, it suggests that we prematurely applied an optimization while one of the involved inputs was actually still top. This looks wrong and we should have waited until all the involved inputs are non-top. When looking at the code, we check that `mask` is an integer type and thus non-top here: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2255-L2260 But it looks like we miss that for `expr` when it is a cast node (which is `996 CastII` in the failing test). We pass `expr` to `AndIL_min_trailing_zeros()` and then uncast it and only then check if it is a proper integer type: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2180-L2185 So, if the type of `996 CastII` in CCP is still top, we skip it with `uncast()` and then check the phi above which has first the constant type `#int:-256`. We can apply the optimization to return type zero. When later updating the type of the phi to `#int:-256...127`, we can no longer apply the optimization and fall back to `MulNode::Value()` where we return top because the input `996 CastII` is still top: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L185-L187 We find top which is narrower than type zero and we fail with the assert. Long story short, you should check for `expr` being top before uncasting it. This was hard to see and is only a problem in CCP. I suggest to add the small reproducer as additional test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2768852982 From thartmann at openjdk.org Tue Apr 1 11:42:17 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 1 Apr 2025 11:42:17 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2732401043 From roland at openjdk.org Tue Apr 1 12:50:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 12:50:14 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2732570055 From chagedorn at openjdk.org Tue Apr 1 12:50:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 12:50:14 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: <7r2XMglIgMjvCYaPfESV79PvYsGTo8vojzPadFN-Hu4=.4d2e576e-fb9b-4dd0-add4-a60248fa03f5@github.com> On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2769245625 From mchevalier at openjdk.org Tue Apr 1 13:04:23 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 1 Apr 2025 13:04:23 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 04:52:53 GMT, Galder Zamarre?o wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: > >> 116: >> 117: @DontInline >> 118: public CrashesNoInline() throws Throwable { > > It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. It's rather bad (uninspired) naming. I based this test on the test introduced by [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997), which (I suspect) is based on the reproducer mentioned in JBS. There are 2 cases: one made EA crash, the other make it fail (not detect the non escaping, as far as I understand). From Vladimir's comment on PR 23284, it used to crash because of a corrupted memory graph. Honestly, I'm not quite clear on that. There is already a test (from said ticket and PR) making sure it doesn't crash. The point of the test I'm adding is to check that the allocation is gone (thanks to EA). Maybe the best is rather to rename the cases "Crashes" and "FailEA": it made sense in the context of the original bug, but it's not very useful names for the future. But I'm not sure what would be fitting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2022810119 From dfenacci at openjdk.org Tue Apr 1 13:19:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 1 Apr 2025 13:19:26 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: <-2HR8vsW5xGAbW5EviewkowFNsq-HH51yjwWA9uLC5g=.6c02442c-2e34-41e8-a808-10ab3c52eefc@github.com> On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/28e6ceb4...9b21648d Looks good to me. Thanks @rwestrel. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2732658616 From dlunden at openjdk.org Tue Apr 1 14:19:10 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:19:10 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Formatting updates - Add register mask fuzzer test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/fbfddb29..5be718e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=10-11 Stats: 324 lines in 2 files changed: 324 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Tue Apr 1 14:19:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:19:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask > As we discussed offline, the test coverage of register mask operations with extended dynamic parts, non-zero offsets, etc. is fairly low (basically limited to the new JTReg tests included in this changeset). To increase coverage, I have extended `test_regmask.cpp` with tests that perform random operations on a register mask and on a reference bit set and check that the result is equivalent on both data structures. Here is the extension: [4ee703f](https://github.com/openjdk/jdk/commit/4ee703f1ab73f8f43d4603d7fa88dcc8f4950ec0). I ran the random tests a few times on different platforms and could not find any failure, which gives a good confidence of the correctness of the register mask operation changes. I also tested the effectiveness of the tests themselves by injecting a few failures in the register mask implementation and confirming their detection. Feel free to include the test extensions in this changeset (you might want to go through the code and clean it up a bit before, thoug h, things like e.g. naming consistency). I've now reviewed the register mask fuzzer tests and found no errors. Looks good! I applied some code formatting, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2769520020 From dlunden at openjdk.org Tue Apr 1 14:36:38 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:36:38 GMT Subject: RFR: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: <0dg9XeqluKkZEUgPNJEzwuCUHiG36RaZvr9GggckWQ4=.1efe129b-865f-41c0-92ac-27b91f055f5a@github.com> On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Good CFG scheduling approximation improvement! Just one style suggestion. src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java line 800: > 798: n.isCFG = true; > 799: } else if (n.inputNode.getProperties().get("type").equals("bottom") > 800: && n.preds.size() > 0 && Suggestion: } else if (n.inputNode.getProperties().get("type").equals("bottom") && n.preds.size() > 0 && For consistent placement of `&&` (already a problem before this changeset, but might as well fix now) ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2732918158 PR Review Comment: https://git.openjdk.org/jdk/pull/24350#discussion_r2022983828 From dskantz at openjdk.org Tue Apr 1 14:42:47 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 1 Apr 2025 14:42:47 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: References: Message-ID: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24350/files - new: https://git.openjdk.org/jdk/pull/24350/files/52667ad5..57ad6dc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24350/head:pull/24350 PR: https://git.openjdk.org/jdk/pull/24350 From dlunden at openjdk.org Tue Apr 1 14:46:25 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:46:25 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2732961006 From chagedorn at openjdk.org Tue Apr 1 15:41:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 15:41:48 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: <8atBFgfznyYBW1gmJE9Brk9yoiWYXL1ts6Wr5t_KqZA=.d25be79a-3730-449c-9552-7d42ffb68d50@github.com> On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Just to let you know, Vladimir is out this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2769791125 From chagedorn at openjdk.org Tue Apr 1 15:44:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 15:44:36 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 20:20:41 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: > > - Document assumptions about code placement in CodeCache > - Address bulasevich comment: too many parameters values Just to let you know, Vladimir is out this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2769803651 From rcastanedalo at openjdk.org Tue Apr 1 15:59:20 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 15:59:20 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2733217668 From epeter at openjdk.org Tue Apr 1 16:06:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:06:22 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:40:35 GMT, Zdenek Zambersky wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Attached file which shows unrecognized VM options for individual tests. > [unrecognized-options.txt](https://github.com/user-attachments/files/19472912/unrecognized-options.txt) @zzambers Generally we want to get away from `@requires vm.compiler2.enabled`, because it means tests are only run on C2 and not other compilers. For example if C2 is disabled and we only have C1. Or only interpreter. Or Graal ... Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769867784 From zzambers at openjdk.org Tue Apr 1 16:16:26 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:16:26 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:49:33 GMT, Aleksey Shipilev wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > test/hotspot/jtreg/compiler/arraycopy/TestCloneWithStressReflectiveCode.java line 28: > >> 26: * @bug 8284951 >> 27: * @summary Test clone intrinsic with StressReflectiveCode. >> 28: * @requires vm.compiler2.enabled & vm.debug > > Drive-by comment: multiple `@requires` get AND-ed automatically, so you can just drop a new line with `@requires vm.compiler2.enabled`, and it will still work. I used `@requires` on separate line in cases, where resulting line would be too long (or too messy), but I can use separate line everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24262#discussion_r2023170812 From jbhateja at openjdk.org Tue Apr 1 16:17:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Apr 2025 16:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/e2faec77..1713057d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From jbhateja at openjdk.org Tue Apr 1 16:17:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Apr 2025 16:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: References: <4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com> Message-ID: On Fri, 28 Mar 2025 22:14:23 GMT, Sandhya Viswanathan wrote: >> Basically assert if one is NaN and other is not. > > On further thought what you have also works. Though we could simplify the assertionCheck method to just one statement: > public static boolean assertionCheck(Float16 actual, Float16 expected) { > return !actual.equals(expected); > } > This is because, the equals method takes care of NaNs. The [equals](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#equals(java.lang.Object)) uses [representation equivalence](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#repEquivalence), defining NaN arguments to be equal to each other. DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2023172099 From zzambers at openjdk.org Tue Apr 1 16:20:53 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:20:53 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 18:49:51 GMT, Vladimir Kozlov wrote: > Can we run some of them with Graal? When no C2 specific flags are used. Unfortunately I don't have experience with Graal. So I don't know how that would work. Does graal implement some C2-only flags? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769902335 From zzambers at openjdk.org Tue Apr 1 16:28:23 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:28:23 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:03:40 GMT, Emanuel Peter wrote: > Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. I saw that approach sometimes used as well. (My little probably unfounded concern would be that typos in args could than be silently ignored.) I can change my PR to use `-XX:-IgnoreUnrecognizedVMOptions` instead, if that approach is preferable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769917073 From rcastanedalo at openjdk.org Tue Apr 1 16:34:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 16:34:24 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 14:19:10 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Formatting updates > - Add register mask fuzzer test I have gone through the entire changeset now and could not find any obvious functional issue, good job Daniel! src/hotspot/share/opto/chaitin.cpp line 1425: > 1423: // a physical register is found > 1424: if (OptoReg::is_reg(assigned)) { > 1425: assert(!lrg.mask().is_offset(), "sanity"); Suggestion: assert(!lrg.mask().is_offset(), "offset register masks can only contain stack slots"); src/hotspot/share/opto/chaitin.cpp line 1533: > 1531: // hesitation). > 1532: if (OptoReg::is_valid(reg2) && > 1533: OptoReg::is_reg(reg2 - lrg.mask().offset_bits())) { I agree that this was probably an oversight in the original code. For simplicity I suggest to replace the check with just `OptoReg::is_reg(reg2)` as you suggest, explicitly limiting the scope of the alternation heuristic to physical registers. I compared the overall effectiveness of post-allocation copy removal (as summarized by `-XX:+PrintOptoStatistics`) between this changeset and your proposed simplification and I cannot see any significant difference. I really wonder if the entire alternation heuristic really has any positive measurable effect, but that investigation belongs to another RFE. src/hotspot/share/opto/chaitin.cpp line 1591: > 1589: // will be a no-op. (Later on, if lrg runs out of possible colors in > 1590: // its chunk, a new chunk of color may be tried, in which case > 1591: // examination of neighbors is started again, at retry_next_chunk.) Doesn't the second part of the comment (`(Later on...)`) still apply after the changes? src/hotspot/share/opto/chaitin.cpp line 1655: > 1653: // Bump register mask up to next stack chunk > 1654: bool success = lrg->rollover(); > 1655: if (!success) { Was this scenario (running out of stack slots representable in `OptoRegPairs`) possible before, or was it prevented by some check removed in the changeset? Did you come across it in some compilation or is it more of a "theoretical" guard? src/hotspot/share/opto/chaitin.cpp line 1658: > 1656: // We should never get here in practice. Bail out in product, > 1657: // assert in debug. > 1658: assert(false, "should not happen"); Suggestion: assert(false, "the next available stack slots should be within the OptoRegPair range"); src/hotspot/share/opto/chaitin.cpp line 1660: > 1658: assert(false, "should not happen"); > 1659: C->record_method_not_compilable( > 1660: "chunk-rollover outside of OptoReg range"); Suggestion: "chunk-rollover outside of OptoRegPair range"); src/hotspot/share/opto/regmask.hpp line 282: > 280: _grow(src._rm_size, false); > 281: memcpy(_RM_UP_EXT, src._RM_UP_EXT, > 282: sizeof(uintptr_t) * (src._rm_size - _RM_SIZE)); This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. src/hotspot/share/opto/regmask.hpp line 293: > 291: _hwm = _rm_max(); > 292: } > 293: _set_range(src._rm_size, value, _rm_size - src._rm_size); This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. test/jdk/java/lang/invoke/BigArityTest.java line 32: > 30: * (1) have a large number of parameters, and > 31: * (2) use JSR292 methods internally (which increases the > 32: * MaxNodeLimit with a factor of 3) Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2733231312 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023172642 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023154419 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023156355 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023177582 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023175078 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023174027 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023183358 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023184495 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023195229 From rcastanedalo at openjdk.org Tue Apr 1 16:38:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 16:38:27 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:28:35 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > test/jdk/java/lang/invoke/BigArityTest.java line 32: > >> 30: * (1) have a large number of parameters, and >> 31: * (2) use JSR292 methods internally (which increases the >> 32: * MaxNodeLimit with a factor of 3) > > Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? Same question for the other `java/lang/invoke` test changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023204041 From epeter at openjdk.org Tue Apr 1 16:39:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 18:49:51 GMT, Vladimir Kozlov wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Can we run some of them with Graal? When no C2 specific flags are used. @vnkozlov do you agree that we should use `-XX:-IgnoreUnrecognizedVMOptions`? @zzambers Graal does not implement all flags, and so you would get the same issue with `Unrecognized VM option`. But it could still be valuable to run the tests with Graal, even if the flags are not doing anything. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769944061 From epeter at openjdk.org Tue Apr 1 16:39:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: <8AtiGaQ_cEwB_7Vi4fDwYUEvLMnjVy6BGwz-4vaqGq4=.096cd043-8296-40b5-bbb1-14ae9b51b12c@github.com> On Tue, 1 Apr 2025 16:23:49 GMT, Zdenek Zambersky wrote: > My little probably unfounded concern would be that typos in args could than be silently ignored. That's not completely unfounded, but I think this taking `-XX:-IgnoreUnrecognizedVMOptions` is still preferrable to `@requires vm.compiler2.enabled`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769946172 From sviswanathan at openjdk.org Tue Apr 1 17:38:30 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 17:38:30 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:17:22 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Thanks for making this change. PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2733543988 From sviswanathan at openjdk.org Tue Apr 1 17:38:30 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 17:38:30 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 13:14:39 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > I have not looked at the x64 instructions, but only the tests again. > > I have noticed that you only cover specific values. You could improve tests with this: > - Add non-canonical NaN values. > - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. > > It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? @eme64 We are looking forward to your approval for this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2770211207 From vlivanov at openjdk.org Tue Apr 1 18:53:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 18:53:13 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: <7cW4vEajJs-DiP7wkmG1j9zmOdw5fHR5FVq6W17lJas=.6c7cfbac-c478-4c18-9b87-5a0a50658363@github.com> On Tue, 1 Apr 2025 07:58:27 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24301#pullrequestreview-2733756865 From vlivanov at openjdk.org Tue Apr 1 19:06:34 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 19:06:34 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v48] In-Reply-To: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> References: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> Message-ID: On Tue, 1 Apr 2025 02:44:03 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > remove unused methods Overall, looks good. Some minor comments follow. src/hotspot/share/opto/addnode.cpp line 1012: > 1010: > 1011: if (r0->is_con() && r1->is_con()) { > 1012: // Constant fold: (c1 ^ c2) -> c3 A bit confusing. The comment mentions `c1` and `c2` while the code operate on `t0`/`r0` and `t1`/`r1`. src/hotspot/share/opto/addnode.cpp line 1019: > 1017: > 1018: if (r0->_lo >= 0 && r1->_lo >= 0) { > 1019: // Combine [0, lo_1] ^ [0, hi_1] -> [0, max] What does this comment refer to? It mentions `lo_1` and `hi_1` while `r0->_hi` and `r1->_hi` are passed into `xor_upper_bound_for_ranges`. Also, I'd avoid naming it`max`: it sort of hints to `max_jint`, but in reality it represents the upper bound of the operation. Why not `upper`/`upper_bound` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2733792136 PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2023525192 PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2023523965 From duke at openjdk.org Tue Apr 1 23:22:09 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 23:22:09 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/50d35dcd..dda134fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=47-48 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From vlivanov at openjdk.org Wed Apr 2 03:01:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 03:01:46 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments Looks good! ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2734546645 From epeter at openjdk.org Wed Apr 2 06:19:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 06:19:39 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 02:49:45 GMT, Johannes Graham wrote: >> @iwanowww >> >>> `_non_neg` part is confusing; I'd stress instead that it works on ranges. >> >> I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. > > Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. @j3graham I gave it a quick look, and it looks even better now. Let me run testing again before you integrate! Please ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2771441200 From duke at openjdk.org Wed Apr 2 06:32:10 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:32:10 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: <57-zPqw_-3qY6G5TZUYXG4MFzx_jmhHRDN78DR-dy0o=.c105c4e4-9ffa-4dd4-9390-70f27e48f217@github.com> On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Thank y'all for the thorough review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771461584 From duke at openjdk.org Wed Apr 2 06:32:11 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 06:32:11 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument @mhaessig Your change (at version 1561a0eea3b2049e4e9e6468d0237f60e97cd2e8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771462472 From duke at openjdk.org Wed Apr 2 06:33:14 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:33:14 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: <2rYcxIlI5lZujCDgdo1RStzxjeJGym2ftPpb2eoxW38=.1006c857-1293-4e15-8fca-2d7ce163f420@github.com> On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24289#issuecomment-2771459481 From duke at openjdk.org Wed Apr 2 06:33:15 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:33:15 GMT Subject: Integrated: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 10:21:57 GMT, Manuel H?ssig wrote: > # Issue Summary > > The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. > > # Changes > > This PR makes the following straight forward changes: > - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. > - Add `Or(I|L)` nodes to the IR framework. > - Add a regression IR test for the implemented optimization. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) > - Ran tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: f301663b Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/f301663b346bf2388ecfa429be1cf64c6e93ee8e Stats: 109 lines in 3 files changed: 109 ins; 0 del; 0 mod 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 Reviewed-by: epeter, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24289 From epeter at openjdk.org Wed Apr 2 06:34:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 06:34:32 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 04:50:17 GMT, Jatin Bhateja wrote: >> I have not looked at the x64 instructions, but only the tests again. >> >> I have noticed that you only cover specific values. You could improve tests with this: >> - Add non-canonical NaN values. >> - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. >> >> It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? > > Hi @eme64 , > This specific issues is around special Float16 values i.e +/- 0.0 and NaN. > I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 > > Best Regards, > Jatin @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2771465382 From chagedorn at openjdk.org Wed Apr 2 06:50:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 06:50:59 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 Message-ID: `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: public Object defaultOnOptoAssembly(Helper h) { return h.getString(); // emits one "Field: " string on most platforms but none on PPC } When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. How to read the `@ExpectedFailure` annotation: @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) // Expect rule with id 5 (the one directly above) to fail: // - We fail when matching PRINT_IDEAL with the: // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) Thanks to @TheRealMDoerr for testing the patch on PPC! Thanks, Christian ------------- Commit messages: - 8353058: [PPC64] Some IR framework tests are failing after JDK-8314999 Changes: https://git.openjdk.org/jdk/pull/24373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353058 Stats: 54 lines in 1 file changed: 17 ins; 8 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24373/head:pull/24373 PR: https://git.openjdk.org/jdk/pull/24373 From chagedorn at openjdk.org Wed Apr 2 06:50:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 06:50:59 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 178: > 176: public void defaultOnOptoAssembly() { > 177: i = 34; > 178: l = 34; Always using this body which reliably emits two "Field: " strings in the opto assembly on all platforms. Thus removed the `Helper` class again. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 228: > 226: } > 227: defaultOnOptoAssembly(new Helper("a", 1)); > 228: defaultOnBoth(new Helper("a", 1)); No longer needed because we do not need to pass anything into the methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24373#discussion_r2024171272 PR Review Comment: https://git.openjdk.org/jdk/pull/24373#discussion_r2024171836 From duke at openjdk.org Wed Apr 2 06:51:28 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:51:28 GMT Subject: Integrated: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:27:59 GMT, Manuel H?ssig wrote: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: d358f5f4 Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d358f5f4a44aacf2d79ccdb3e362ce8ed571f6da Stats: 150 lines in 7 files changed: 128 ins; 2 del; 20 mod 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24248 From epeter at openjdk.org Wed Apr 2 07:10:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:10:23 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Sat, 29 Mar 2025 07:27:24 GMT, Shaojin Wen wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > add StringBuilderUnsafePut @wenshao @iwanowww I have a few concerns about this PR. Your current PR description says this: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069 You say this: > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. And like I asked in previously: > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 693: > 691: } > 692: BH.consume(off); > 693: } This is a copy pattern, not MergeStores. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 735: > 733: } > 734: BH.consume(off); > 735: } @wenshao This is a copy pattern. Not a MergeStore pattern. So I can tell you already now that it will not be optimized by MergeStores ;) test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 799: > 797: } > 798: BH.consume(off); > 799: } @wenshao Why would MergeStores work here? This is is a copy pattern. That is not at all covered by MergeStores. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 856: > 854: } > 855: BH.consume(sb.length()); > 856: } Why would you expect MergeStores to work here? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24108#pullrequestreview-2734816014 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024171061 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024170015 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024169285 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024172517 From mchevalier at openjdk.org Wed Apr 2 07:13:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:13:19 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Looks good to me. I've also used `FIELD_ACCESS` in TestCompilePhaseCollector.java, but I think it's harmless there since we are not matching, but just using it for its default phase. But I still mention, just in case... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2771540722 From epeter at openjdk.org Wed Apr 2 07:15:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:15:27 GMT Subject: RFR: 8346964: C2: Improve integer multiplication with constant in MulINode::Ideal() [v3] In-Reply-To: References: <4UC1x1GPJCcIwPXKJZfiUGxQnuRaDQjOcN53wYmUzF4=.fafd71c1-2f48-4ae4-8e7e-8844c578429a@github.com> <6PtcpyIAXa2wbi0CI5-DVvI1r2RRDvKtIWko7nvBDFo=.49b4d6f7-0dda-42e7-9f51-bfa3c06ef6f5@github.com> Message-ID: <8P3c-UQwGnV7gzMapQf_YAQHQLaIKTvYGFY3O5Of2UU=.87fa4250-e2f5-4efd-b6ab-fd2298a8bea7@github.com> On Thu, 9 Jan 2025 06:21:14 GMT, erifan wrote: >> @erifan I did some more thinking when falling asleep / waking up. This is a really interesting problem here. >> >> For `MulINode::Ideal` with patterns `var * con`, we really have these options in assembly: >> - `mul` general case. >> - `shift` and `add` when profitable. >> - `lea` could this be an improvement over `shift` and `add`? >> >> The issue is that different platforms have different characteristics here for these instructions - we would have to see how they differ. As far as I remember `mul` is not always available on all `ALU`s, but `add` and `shift` should be available. This impacts their throughput (more ports / ALU means more throughput generally). But the instructions also have different latency. Further, I could imagine that at some point more instructions may not just affect the throughput, but also the code-size: that in turn would increase IR and may at some point affect the instruction cache. >> >> Additionally: if your workload has other `mul`, `shift` and `add` mixed in, then some ports may already be saturated, and that could tilt the balance as to which option you are supposed to take. >> >> And then the characteristics of scalar ops may not be identical to vector ops. >> >> It would be interesting to have a really solid benchmark, where you explore the impact of these different effects. >> And it would be interesting to extract a table of latency + throughput characteristics for all relevant scalar + vector ops, for a number of different CPUs. Just so we get an overview of how easy this is to tune. >> >> Maybe perfect tuning is not possible. Maybe we are willing to take a `5%` regression in some cases to boost other cases by `30%`. But that is a **big maybe**: we really do not like getting regressions in existing code, it tends to upset people more if they get regressions compared to how much they enjoy speedups - so work like this can be delicate. >> >> Anyway, I don't right now have much time to investigate and work on this myself. So you'd have to do the work, benchmark, explanation etc. **But I think the `30%` speedup indicates that this work could really have potential!** >> >> As to what to do in sequence, here a suggestion: >> 1. First work on Vector API cases of vector multiplication - this should have no impact on other things. >> 2. Delay the `MulINode::Ideal` optimizations until after loop-opts: scalar code would still be handled in the old way, but auto-vectorized code would then be turned into `MulV`. And then go into the mul -> sh... > > Hi @eme64 thanks for your review. > > 1. First work on Vector API cases of vector multiplication - this should have no impact on other things. > 2. Delay the MulINode::Ideal optimizations until after loop-opts: scalar code would still be handled in the old way, but auto-vectorized code would then be turned into MulV. And then go into the mul -> shift optimization for vectors under point 1. > 3. Tackle MulINode::Ideal for scalar cases after loop-opts, and see what you can do there. > > I agree with you. I am actually working on `1`. The slightly troublesome thing is that `1` and `3` are both related to the architecture, so it might take a little more time. > >> lea could this be an improvement over shift and add? > > AARCH64 doesn't actually have a `lea` instruction. On x64 there are already some rules that turn `shift add` into `lea`. > > The issue is that different platforms have different characteristics here for these instructions - we would have to see how they differ. As far as I remember mul is not always available on all ALUs, but add and shift should be available. This impacts their throughput (more ports / ALU means more throughput generally). But the instructions also have different latency. Further, I could imagine that at some point more instructions may not just affect the throughput, but also the code-size: that in turn would increase IR and may at some point affect the instruction cache. > > Additionally: if your workload has other mul, shift and add mixed in, then some ports may already be saturated, and that could tilt the balance as to which option you are supposed to take. > > And then the characteristics of scalar ops may not be identical to vector ops. > > > Yes this is very trick, the actual performance is related to many aspects, such as pipeline, latency, throughput, ROB, and even memory performance. We can only do optimization based on certain references and generalities, such as latency and throughput of different instructions. But when it comes to generalities, it is actually difficult to say which scenario is more general. > >> It would be interesting to have a really solid benchmark, where you explore the impact of these different effects. > And it would be interesting to extract a table of latency + throughput characteristics for all relevant scalar + vector ops, for a number of different CPUs. Just so we get an overview of how easy this is to tune. > > I don't know such a benchmark suite yet. For AARCH64, I usually refer to [the Arm Optimization Guide](https:... @erifan you opened this again. Does that mean we should review again? I see that you did not make any changes since our last conversation. If it is not ready for review, could you please convert it to Draft, so it is clear that you are not asking for reviews currently? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22922#issuecomment-2771545449 From mchevalier at openjdk.org Wed Apr 2 07:19:25 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:19:25 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <2P9iZTfGS3zMibNJEqMfO_yf-Pir-hYdZFjUA3C5DSg=.fc405c98-769a-47a0-89ad-5ac2cf742fdf@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Marked as reviewed by mchevalier (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24373#pullrequestreview-2734883035 From chagedorn at openjdk.org Wed Apr 2 07:19:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:19:26 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> References: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> Message-ID: On Wed, 2 Apr 2025 07:10:38 GMT, Marc Chevalier wrote: > Looks good to me. I've also used `FIELD_ACCESS` in TestCompilePhaseCollector.java, but I think it's harmless there since we are not matching, but just using it for its default phase. But I still mention, just in case... Thanks for your review Marc! Yes, there we do not perform the actual IR matching, so it's not a problem for platform specific differences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2771546971 From dfenacci at openjdk.org Wed Apr 2 07:19:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 2 Apr 2025 07:19:26 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Nice! Thanks @danielogh! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2734882114 From epeter at openjdk.org Wed Apr 2 07:25:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:25:25 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:59:07 GMT, Qizheng Xing wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into enhance-loop-safepoint-elim >> - Add IR test and microbench. >> - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. > > The second question: > >> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? > > I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. > > Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. @MaxXSoft > Running with -XX:SafepointTimeoutDelay=500 caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. Wow, that sounds like we do not safepoint for half a second in that case. That could be a bug. Could you please tell me what test it is, and how you ran it? We may want to file a bug and investigate it. @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771559333 PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771565388 From epeter at openjdk.org Wed Apr 2 07:25:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:25:25 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 09:51:46 GMT, Qizheng Xing wrote: > On the one hand, this situation won't occur in the current Compile::Optimize process. The Optimize method will always complete all inlining before performing loop optimization And what about late inlining? Does that not happen after loop opts? Maybe we insert new SafePoints when inlining, I simply don't know enough about that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771561565 From epeter at openjdk.org Wed Apr 2 07:26:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:26:30 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/5362121c...9b21648d test/hotspot/jtreg/compiler/arraycopy/TestSunkLoadAntiDependency.java line 28: > 26: * @bug 8341976 > 27: * @summary C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure > 28: * @run main/othervm -XX:-BackgroundCompilation TestSunkLoadAntiDependency Would it make sense to have a run without any flags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2024220984 From mchevalier at openjdk.org Wed Apr 2 07:32:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:32:03 GMT Subject: RFR: 8353341: Fuzzer tests crashing: assert(projs->fallthrough_proj != nullptr) failed: must be found Message-ID: If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. Thanks, Marc ------------- Commit messages: - Don't remove Mod[DF]Node that don't have control output Changes: https://git.openjdk.org/jdk/pull/24375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353341 Stats: 99 lines in 2 files changed: 97 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From epeter at openjdk.org Wed Apr 2 07:32:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:32:08 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:20:48 GMT, Hannes Greule wrote: > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! I'm running some testing, please ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2771580262 From thartmann at openjdk.org Wed Apr 2 07:32:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:32:06 GMT Subject: RFR: 8353341: Fuzzer tests crashing: assert(projs->fallthrough_proj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:19:35 GMT, Marc Chevalier wrote: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > Thanks, > Marc Looks good to me! src/hotspot/share/opto/divnode.cpp line 1521: > 1519: > 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; > 1521: bool has_control_output = proj_out_or_null(TypeFunc::Control) != nullptr; Nit: Maybe replace this with `is_dead = proj_out_or_null(TypeFunc::Control) == nullptr;` and check for `!is_dead` below? test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 70: > 68: } > 69: } > 70: Suggestion: test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 93: > 91: } > 92: } > 93: Suggestion: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734903737 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024225945 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024228770 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024224053 From epeter at openjdk.org Wed Apr 2 07:37:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:37:11 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc @marc-chevalier It probably makes most sense if the authors and reviewers of [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) review this patch (@vnkozlov @chhagedorn @TobiHartmann ). But please ping me if you don't get reviews in a week or so, then I can have a look too ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2771592656 From jbhateja at openjdk.org Wed Apr 2 07:39:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:02 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/ee67ee22..ae48895b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=00-01 Stats: 189 lines in 2 files changed: 160 ins; 4 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From jbhateja at openjdk.org Wed Apr 2 07:39:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:03 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 08:43:23 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > @jatin-bhateja Thanks for looking into this! I left a first set of comments :) > > Primarily, it is about these issues: > - We need good comments, preferably even proofs. Because we got things wrong the last time, and there were no comments/proofs. It's difficult to get this sort of arithmetic transformation right, and it is hard to review. Proofs help to think through all the steps carefully. > - Test coverage: I would like to see some more randomized cases of input ranges. Hi @eme64 , I have addressed your comments, let me know if you need further clarifications. > src/hotspot/share/opto/intrinsicnode.cpp line 278: > >> 276: } else { >> 277: // Case 3) Mask value range only includes +ve values, this can again be >> 278: // used to ascertain known Zero bits of resultant value. > > I would put this case as the first, swapping it with Case 1). > And I would say something more explicit like this: > `Case 3) The mask value range is non-negative. Hence, the mask has at least one zero bit.` Case ordering is in accordance with the mask value range. case 1) mask value spans across -ve and -ve value ranges. case 2) mask value strictly lie within -ve value range. case 3) mask value strictly lie within +ve value range. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2771593965 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024244475 From jbhateja at openjdk.org Wed Apr 2 07:39:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:04 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> References: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> Message-ID: <_vrEsrg7VNWQDlSYv5PO7CsGH2tNfrwyMShkxtpdqhQ=.434c6c72-e84f-40e3-8791-42e26652ee64@github.com> On Wed, 12 Mar 2025 08:08:19 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/intrinsicnode.cpp line 283: >> >>> 281: clz = bt == T_INT ? clz - 32 : clz; >>> 282: mask_max_bw = max_bw - clz; >>> 283: } >> >> Can you please put the comments for cases 1-3 either consistently before the condition, or after the condition with inlining? I would vote for inside each condition with indentation, so just like case 3), except 2 spaces indented ;) > > Why not start with the "nice" case 3) first, where we know that the range is positive, and so even after compression we cannot get negative values? > > What does this mean `only includes +ve values`? Case ordering is in accordance with the mask value range. case 1) mask value spans across -ve and -ve value ranges. case 2) mask value strictly lie within -ve value range. case 3) mask value strictly lie within +ve value range. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024244581 From dskantz at openjdk.org Wed Apr 2 07:39:29 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Wed, 2 Apr 2025 07:39:29 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24350#issuecomment-2771595747 From duke at openjdk.org Wed Apr 2 07:39:30 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 07:39:30 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: <8BJI6ui7ndUA4OTPv3xMzTpg5G2bzn2l9vhUlenT7IE=.f70ad9c1-52d7-4309-b7e8-3fd97e58cc76@github.com> On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n @danielogh Your change (at version 57ad6dc825404d2628aa376f0fa8d78090313d33) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24350#issuecomment-2771597533 From epeter at openjdk.org Wed Apr 2 07:41:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:41:09 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Looks reasonable, nice to see more verification code :) src/hotspot/share/opto/predicates.cpp line 1250: > 1248: // graph (otherwise, they would have been marked useful instead). This is verified in this method. > 1249: void EliminateUselessPredicates::verify_loop_nodes_of_useless_templates_assertion_predicates_are_dead() const { > 1250: Unique_Node_List loop_nodes_of_useless_template_assertion_predicates = Should we add `ResourceMark` here, or is there one close by that suffices? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2734938554 PR Review Comment: https://git.openjdk.org/jdk/pull/24326#discussion_r2024248198 From thartmann at openjdk.org Wed Apr 2 07:45:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:45:10 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <5EDodzal0YHCnEW3k6lszJPxcNGwHtDw4qHGHhQSk_k=.7e66e90a-dd36-4bd3-bc69-26c9e828e377@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24373#pullrequestreview-2734953812 From mchevalier at openjdk.org Wed Apr 2 07:45:41 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24375/files - new: https://git.openjdk.org/jdk/pull/24375/files/2a347bc0..f1f0b93b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From thartmann at openjdk.org Wed Apr 2 07:45:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:42:37 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734948297 From mchevalier at openjdk.org Wed Apr 2 07:45:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:45:42 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:27:04 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/divnode.cpp line 1521: > >> 1519: >> 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; >> 1521: bool has_control_output = proj_out_or_null(TypeFunc::Control) != nullptr; > > Nit: Maybe replace this with `is_dead = proj_out_or_null(TypeFunc::Control) == nullptr;` and check for `!is_dead` below? Fine with me! At the very least, your name is more semantic, and less "here is a name that repeats what the code says". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024251731 From chagedorn at openjdk.org Wed Apr 2 07:45:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:42:37 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks good! src/hotspot/share/opto/divnode.cpp line 1522: > 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; > 1521: bool is_dead = proj_out_or_null(TypeFunc::Control) == nullptr; > 1522: if (result_is_unused && !is_dead) { Might be easier to read when it's flipped to avoid negation with `!` but I leave it up to you to decide which one you prefer :-) bool not_dead = proj_out_or_null(TypeFunc::Control) != nullptr; if (result_is_unused && not_dead) { test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 65: > 63: } > 64: iArr[1] += 5; > 65: Suggestion: test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 87: > 85: } > 86: iArr[1] += 5; > 87: Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734945309 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024255180 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024252459 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024252598 From dskantz at openjdk.org Wed Apr 2 07:48:25 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Wed, 2 Apr 2025 07:48:25 GMT Subject: Integrated: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. This pull request has now been integrated. Changeset: 8fb67ac5 Author: Daniel Skantz Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/8fb67ac55bb61c029a3ae360ee849fd1edd2ac79 Stats: 23 lines in 1 file changed: 20 ins; 0 del; 3 mod 8282053: IGV: refine schedule approximation Reviewed-by: rcastanedalo, dlunden, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/24350 From thartmann at openjdk.org Wed Apr 2 07:50:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:50:29 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: <_AWQLLGypcMLFX52xPmTeow5fWrbqLyqGT4WfqFZl2w=.ed830608-8f66-4198-bc1d-aaa00a71766f@github.com> On Tue, 1 Apr 2025 13:01:05 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: >> >>> 116: >>> 117: @DontInline >>> 118: public CrashesNoInline() throws Throwable { >> >> It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. > > It's rather bad (uninspired) naming. I based this test on the test introduced by [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997), which (I suspect) is based on the reproducer mentioned in JBS. There are 2 cases: one made EA crash, the other make it fail (not detect the non escaping, as far as I understand). From Vladimir's comment on PR 23284, it used to crash because of a corrupted memory graph. Honestly, I'm not quite clear on that. There is already a test (from said ticket and PR) making sure it doesn't crash. The point of the test I'm adding is to check that the allocation is gone (thanks to EA). Maybe the best is rather to rename the cases "Crashes" and "FailEA": it made sense in the context of the original bug, but it's not very useful names for the future. But I'm not sure what would be fitting. Right, I would suggest to rename these methods. The purpose of this test is not to reproduce the crashes that happened before [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) (which has it's own regression test), but to verify that EA is able to remove allocations around the pin/unpin intrinsic now that the crashes are fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024265119 From chagedorn at openjdk.org Wed Apr 2 07:53:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:53:53 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Add ResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24326/files - new: https://git.openjdk.org/jdk/pull/24326/files/38e8e865..14a90a8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24326&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24326&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24326.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24326/head:pull/24326 PR: https://git.openjdk.org/jdk/pull/24326 From chagedorn at openjdk.org Wed Apr 2 07:53:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:53:54 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:36:22 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Add ResourceMark > > src/hotspot/share/opto/predicates.cpp line 1250: > >> 1248: // graph (otherwise, they would have been marked useful instead). This is verified in this method. >> 1249: void EliminateUselessPredicates::verify_loop_nodes_of_useless_templates_assertion_predicates_are_dead() const { >> 1250: Unique_Node_List loop_nodes_of_useless_template_assertion_predicates = > > Should we add `ResourceMark` here, or is there one close by that suffices? Good idea! I think the closest one will only be in `PhaseIdealLoop::optimize()` once we are done with one round of loop opts. So, it would make sense to add one here. Pushed an updated. Will ran some more testing to check that we don't hit any surprises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24326#discussion_r2024263260 From epeter at openjdk.org Wed Apr 2 07:53:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:53:53 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:49:34 GMT, Christian Hagedorn wrote: >> As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: >> >> After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add ResourceMark Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2734966884 From mchevalier at openjdk.org Wed Apr 2 07:57:22 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:57:22 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:41:01 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/divnode.cpp line 1522: > >> 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; >> 1521: bool is_dead = proj_out_or_null(TypeFunc::Control) == nullptr; >> 1522: if (result_is_unused && !is_dead) { > > Might be easier to read when it's flipped to avoid negation with `!` but I leave it up to you to decide which one you prefer :-) > > bool not_dead = proj_out_or_null(TypeFunc::Control) != nullptr; > if (result_is_unused && not_dead) { I'd agree if I wrote `!not_dead`. But between writing `! a_positive_property` and `a_negative_property`, I'm much less decided. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024275097 From thartmann at openjdk.org Wed Apr 2 07:57:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:57:52 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: <-oDjoDP_yRuceA3tSsjHt7T8NaU7yZHbDexm8UviZPg=.a2b72ab5-f851-47c5-9003-64b6bba2092e@github.com> References: <-oDjoDP_yRuceA3tSsjHt7T8NaU7yZHbDexm8UviZPg=.a2b72ab5-f851-47c5-9003-64b6bba2092e@github.com> Message-ID: On Wed, 2 Apr 2025 07:50:35 GMT, Tobias Hartmann wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 51: > >> 49: try { >> 50: test_FailsEA(); >> 51: } catch (Throwable _) { > > These should normally not throw, right? I would just propagate the exception upwards. Otherwise we risk hiding real issues. And isn't `Exception` or a subclass sufficient? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024276355 From thartmann at openjdk.org Wed Apr 2 07:57:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:57:52 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: <-oDjoDP_yRuceA3tSsjHt7T8NaU7yZHbDexm8UviZPg=.a2b72ab5-f851-47c5-9003-64b6bba2092e@github.com> On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 51: > 49: try { > 50: test_FailsEA(); > 51: } catch (Throwable _) { These should normally not throw, right? I would just propagate the exception upwards. Otherwise we risk hiding real issues. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024270186 From duke at openjdk.org Wed Apr 2 08:03:06 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 2 Apr 2025 08:03:06 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v10] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Move code to addnode.cpp and add more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/e37c4bf3..ee511bf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=08-09 Stats: 1350 lines in 3 files changed: 770 ins; 578 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From mchevalier at openjdk.org Wed Apr 2 08:08:07 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:08:07 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: > > Then: > > And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. > > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address review comments, part 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24375/files - new: https://git.openjdk.org/jdk/pull/24375/files/f1f0b93b..48bd2037 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=01-02 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From mchevalier at openjdk.org Wed Apr 2 08:08:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:08:08 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:53:15 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/divnode.cpp line 1522: >> >>> 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; >>> 1521: bool is_dead = proj_out_or_null(TypeFunc::Control) == nullptr; >>> 1522: if (result_is_unused && !is_dead) { >> >> Might be easier to read when it's flipped to avoid negation with `!` but I leave it up to you to decide which one you prefer :-) >> >> bool not_dead = proj_out_or_null(TypeFunc::Control) != nullptr; >> if (result_is_unused && not_dead) { > > I'd agree if I wrote `!not_dead`. But between writing `! a_positive_property` and `a_negative_property`, I'm much less decided. I ended up flipping as suggested because I've seen fonts/colors making the `!` not clearly a non-letter (github, for instance isn't very good at that imo), and sometimes, people might be surprised. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024293297 From mchevalier at openjdk.org Wed Apr 2 08:09:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:09:48 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:45:41 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I've fixed everything, ready for next round. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24375#issuecomment-2771728974 From duke at openjdk.org Wed Apr 2 08:23:50 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 2 Apr 2025 08:23:50 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v11] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Remove unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/ee511bf1..279c354a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=09-10 Stats: 17 lines in 3 files changed: 0 ins; 16 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From mchevalier at openjdk.org Wed Apr 2 08:31:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:31:19 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash Message-ID: First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. Thanks, Marc ------------- Commit messages: - No collapse double shift left in IGVN + remove from hashtable before set_req Changes: https://git.openjdk.org/jdk/pull/24355/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353345 Stats: 62 lines in 2 files changed: 60 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24355/head:pull/24355 PR: https://git.openjdk.org/jdk/pull/24355 From shade at openjdk.org Wed Apr 2 08:56:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 08:56:20 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v3] In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Minor whitespace reverts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24301/files - new: https://git.openjdk.org/jdk/pull/24301/files/527854ec..77262978 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24301.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24301/head:pull/24301 PR: https://git.openjdk.org/jdk/pull/24301 From shade at openjdk.org Wed Apr 2 08:56:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 08:56:21 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Tue, 1 Apr 2025 07:58:27 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks! Testing looks green. I need another Reviewer before I integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24301#issuecomment-2771872615 From shade at openjdk.org Wed Apr 2 08:57:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 08:57:59 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Looks fine to me, but again, @veresov or someone else from Compiler team needs to take a look. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24298#pullrequestreview-2735412352 From mdoerr at openjdk.org Wed Apr 2 10:01:19 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Apr 2025 10:01:19 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Thanks for the fix! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2772021612 From epeter at openjdk.org Wed Apr 2 10:01:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:01:43 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:39:02 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions @jatin-bhateja Thanks for the updates! I have a few more requests :) src/hotspot/share/opto/intrinsicnode.cpp line 266: > 264: if ( opc == Op_CompressBits) { > 265: // Pattern: Integer/Long.compress(src_type, mask_type) > 266: int max_mask_bit_width; Suggestion: int result_bit_width; Is this bit width not about the result? It is really not about the mask. Example: `mask_type->hi_as_long() < -1L` Here, the mask has the uppermost bit set, and so the bit width of it is the maximum 32 / 64 bits. But we still can deduce that the result has one leading zero bit, and so the bit width of the result is either 31 or 63. src/hotspot/share/opto/intrinsicnode.cpp line 274: > 272: } else if (mask_type->hi_as_long() < -1L) { > 273: // Case 2) Mask value range is less than -1, this indicates presence of at least > 274: // one zero bit in the mask value, there by constraining the result of compression Suggestion: // one zero bit in the mask value, thereby constraining the result of compression src/hotspot/share/opto/intrinsicnode.cpp line 292: > 290: // compression result will never be a -ve value and we can safely set the > 291: // lower bound of the result value range to zero. > 292: lo = max_mask_bit_width == mask_bit_width ? lo : 0L; Can you please add an assert that we are not making `lo` worse than what we already have? Someone may insert optimizations above that set `lo > 0`, and then you may lower it again here. Suggestion: assert(lo < 0, "we should not lower the value of lo"); lo = max_mask_bit_width == mask_bit_width ? lo : 0L; src/hotspot/share/opto/intrinsicnode.cpp line 298: > 296: // in case input equals above estimated lower bound. > 297: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); > 298: hi = max_mask_bit_width < mask_bit_width ? (1L << max_mask_bit_width) - 1 : hi; I still don't understand your comment here. For example, I don't see a `max_int` in the code... And I also don't see anything that deals with constants in the code explicitly. And similarly as above, how do we ensure that `hi` is not raised accidentally? src/hotspot/share/opto/intrinsicnode.cpp line 391: > 389: return TypeInteger::zero(bt); > 390: } > 391: Is this change related to the PR title? And do you have any tests for it? test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Ah, I just noticed the test directory. I think we can put it in a more specific location. test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 29: > 27: * @library /test/lib / > 28: * @summary C2: wrong result: Integer/Long.compress gets wrong type from CompressBitsNode::Value. > 29: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 compiler.c2.TestBitCompressValueTransform Do you really need the flags here? The IR framework already makes sure that compilation happens, and then we execute the test again. So `Xbatch` may not be necessary to reproduce the bug. And same for `TieredCompilation`. Maybe we actually don't need any flags, but please check! ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2734951364 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024480144 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024462231 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024485135 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024491349 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024260171 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024257750 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024256227 From epeter at openjdk.org Wed Apr 2 10:01:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:01:48 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 08:20:36 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/share/opto/intrinsicnode.cpp line 280: > >> 278: // used to ascertain known Zero bits of resultant value. >> 279: assert(mask_type->lo_as_long() >= 0, ""); >> 280: jlong clz = count_leading_zeros(mask_type->hi_as_long()); > > Suggestion: > > jlong clz = count_leading_zeros(mask_type->hi_as_long()); > // The mask has at least clz leading zeros, and hence also the compression > // result must have at least clz leading zeros. I think a comment like this is still missing. You should somehow say that the leading zeros in the mask translate to leading zeros in the result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024472048 From mchevalier at openjdk.org Wed Apr 2 10:03:30 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 10:03:30 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Remove catch, rename, remove static ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24328/files - new: https://git.openjdk.org/jdk/pull/24328/files/f53138f7..efa712be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24328&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24328&range=00-01 Stats: 35 lines in 1 file changed: 0 ins; 12 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24328/head:pull/24328 PR: https://git.openjdk.org/jdk/pull/24328 From mchevalier at openjdk.org Wed Apr 2 10:03:39 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 10:03:39 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: <-IKJ5CKTvSR2Y3YcTBHNtNXQkQcvfpeZkyPb0AhCS_g=.bbcfc206-de65-4e83-9bf4-3c11582af9dc@github.com> On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Made quite some changes, but in particular: got rid of catches, and renamed cases. New names are not that much more inspired, but at least, not confusing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2772064669 From epeter at openjdk.org Wed Apr 2 10:15:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:15:59 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 21:01:23 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix ((x< `a*(b + c)` are already handled by `AddNode::IdealIL`. It would just be a shame to have all the complexity of matching specific cases, but not take the chance to make it a bit more general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2772097646 From chagedorn at openjdk.org Wed Apr 2 10:24:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 10:24:00 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 Thanks for the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2735777321 From chagedorn at openjdk.org Wed Apr 2 10:24:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 10:24:58 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Thanks Tobias and Martin for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2772117482 From epeter at openjdk.org Wed Apr 2 10:29:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:29:06 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 21:01:23 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix ((x< 413: if (find_power_of_two_addition_pattern(this, bt).valid) { > 414: return nullptr; > 415: } Hmm. So somewhere we would have generated that pattern, probably in MulNode. Can you add a verification there, to check that we are only generating patterns that `find_power_of_two_addition_pattern` recognizes? That would make sure that we keep the code here and there in sync. src/hotspot/share/opto/addnode.cpp line 428: > 426: ((mul = find_simple_multiplication_pattern(in1, bt)).valid && mul.variable == in2) || > 427: ((mul = find_power_of_two_addition_pattern(in1, bt)).valid && mul.variable == in2) > 428: ) { I find this quite difficult to read. And it looks repetitive too. Maybe you can refactor it? Also, it would be nice to have comments with the patterns here, to see which one covers what case, so that we have a nice overview. src/hotspot/share/opto/addnode.cpp line 431: > 429: Node* con = (bt == T_INT) > 430: ? (Node*) phase->intcon((jint) (mul.multiplier + 1)) // intentional type narrowing to allow overflow at max_jint > 431: : (Node*) phase->longcon((mul.multiplier + 1)); I think just to be safe, you should use `java_add` to have correct overflow semantics. You are using `jlong` for `multiplier`, which is a signed integer type, and in C++ overflow is undefined behavior as far as I know, let's avoid that ;) Actually, do you have a test where the multiplier overflows here? src/hotspot/share/opto/addnode.cpp line 509: > 507: if (rhs.valid && rhs.variable == n->in(1)) { > 508: return Multiplication{true, rhs.variable, rhs.multiplier + 1}; > 509: } Hmm, it seems these are patterns that you did not promise you would cover in the description above. It makes it a little difficult to keep the overview... ------------- PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-2735724858 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024508253 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024512437 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024530927 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024545411 From epeter at openjdk.org Wed Apr 2 10:29:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:29:08 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:05:45 GMT, Emanuel Peter wrote: >> `AddNode::IdealIL` handles to more general associative patterns like `(a*b) + (a*c)` into `a*(b + c)` > > Ah interesting. It could be worth adding a comment for that here then! That was in fact a large part of my initial hesitation with this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024520170 From epeter at openjdk.org Wed Apr 2 10:29:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:29:07 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:16:14 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 407: >> >>> 405: } >>> 406: >>> 407: // Try to convert a serial of additions into a single multiplication. Also convert `(a * CON) + a` to `(CON + 1) * a` as >> >> What about `(a * CON1) + (a * CON2)`? Like `11 * a + 5 * a`. Do we also optimize that? > > `AddNode::IdealIL` handles to more general associative patterns like `(a*b) + (a*c)` into `a*(b + c)` Ah interesting. It could be worth adding a comment for that here then! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024516603 From chagedorn at openjdk.org Wed Apr 2 10:33:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 10:33:50 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: <63a9PwFdCvNjpNhf32MIoGRYl96-WggnmMck-wk13vs=.bf2b75f5-926b-48f7-8107-275019e9b0e0@github.com> On Wed, 2 Apr 2025 07:53:53 GMT, Christian Hagedorn wrote: >> As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: >> >> After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add ResourceMark Thanks Emanuel for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2772139205 From epeter at openjdk.org Wed Apr 2 10:34:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:34:52 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: <28nDVS1w4dstG6N-J-GAeIOVX6imXMijoxsIvXej5gU=.9451c7ef-e718-4418-8c32-37c242e906bc@github.com> On Tue, 25 Mar 2025 16:18:20 GMT, Kangcheng Xu wrote: >> This looks really interesting! >> >> I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think? > > @eme64 Could you please take a look at this if you have some time? Thanks! @tabjy One more comment: I have had bad experiences before with pattern matching that only covered a part of the cases, and where methods did sometimes do more than what they promised in their name or documentation. These things tend to get extended later, and the overview gets worse and worse until nobody has the overview and bugs creep in that are hard to discover in a review. Can you do some experimenting and see if you can come up with a cleaner design? Maybe write down at the beginning in `convert_serial_additions` what is the general form of the patterns you cover? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2772140252 From thartmann at openjdk.org Wed Apr 2 10:46:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 10:46:05 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static Thanks for making these changes, looks good to me. test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 41: > 39: public class TestContinuationPinningAndEA { > 40: public static void main(String[] args) { > 41: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.vm=ALL-UNNAMED"); Why is this only needed now? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2735827767 PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024569397 From thartmann at openjdk.org Wed Apr 2 10:46:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 10:46:49 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2735832842 From mchevalier at openjdk.org Wed Apr 2 11:28:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 11:28:50 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:42:19 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove catch, rename, remove static > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 41: > >> 39: public class TestContinuationPinningAndEA { >> 40: public static void main(String[] args) { >> 41: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.vm=ALL-UNNAMED"); > > Why is this only needed now? I think it was already not fine. I had the all the `catch` because of the `throws` in the base example, defensively, by default (just assuming that these `throws Throwable` were there for a good reason). It hid the fact that the loading of `Continuation` failed (it's not exported). Nevertheless, I got enough for the IR check to work. I might be wrong on the reason, but my understanding, is that since it's intrinsiced, compilation can manage to produce IR and enough printing to make the check work. And I think it was really working, not just ignored: I got a test failure on the non-inlined cases before I add the `DontInline` annotation, if I remember well. But at runtime, `Continuation` class' access is checked (when loading it?), and then it throws to tell me I'm not allowed to access it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024629833 From epeter at openjdk.org Wed Apr 2 11:36:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 11:36:59 GMT Subject: RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v3] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 17:49:54 GMT, Jatin Bhateja wrote: >> Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature. >> Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations. >> >> Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence. >> >> Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction. >> >> Following are the performance numbers of the following existing microbenchmark >> https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java >> >> Patch passes following validation test >> [test/jdk/java/lang/Math/IeeeRecommendedTests.java >> ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java) >> >> >> Granite Rapids-AP (P-core Xeon) >> Baseline AVX512: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns >> >> Baseline AVX2: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns >> >> Sierra Forest (E-core Xeon) >> Baseline: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns >> o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns >> >> Withopt: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676.... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding vector support along with some refactoring. Then non x64 specific code looks reasonable, though I have 2 comments ;) test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 79: > 77: IntStream.range(0, SIZE - 8).forEach(i -> { dmagnitude[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); }); > 78: IntStream.range(0, SIZE).forEach(i -> { fsign[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); }); > 79: IntStream.range(0, SIZE).forEach(i -> { dsign[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); }); Why not use Generators.java ? That would also give you NaN, infinity, etc ;) test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 122: > 120: } > 121: } > 122: } Verify.checkEQ should do this for you.... though maybe you'd have to wait for https://github.com/openjdk/jdk/pull/24224 not to get into trouble with different NaN encodings. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23386#pullrequestreview-2735939246 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2024635995 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2024638031 From chagedorn at openjdk.org Wed Apr 2 11:57:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 11:57:01 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 11:51:13 GMT, Marc Chevalier wrote: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc src/hotspot/share/opto/mulnode.cpp line 981: > 979: } > 980: return maskedShift; > 981: } IIUC, we are masking a shift with a too large shift amount by masking it but normally that's directly done when transforming the inner shift. However, we could also have cases where we already process an outer shift where the inner shift is not yet replaced with the masked shift amount. So, we do the inner shift transformation as part of the outer shift processing. It seems that for the double shift optimization, we only care about the actual masked value. I reckon that the inner shift is always on the worklist somewhere and will eventually be transformed later - there is no need to do it eagerly now (if it's not on the worklist, we should update the notification code for IGVN). This suggests that we could do the following instead: - Rename `maskShiftAmount` into `mask_and_replace_shift_amount()`. - Introduce `mask_shift_amount()` that only calculates the masked shift amount without updating it in the graph. - Update `mask_and_replace_shift_amount()`: First call `mask_shift_amount()` to get the masked amount. If it's different, do the graph surgery. Return the masked shift amount. - Use `mask_and_replace_shift_amount()` everywhere where we can safely do the update, i.e. where we call the `maskShiftAmount()` with `this`. - Use `mask_shift_amount()` where we used to call `maskShiftAmount()` with `non-this`, i.e. surgery is not implicitly safe. Then you also do not need the `record_fo_igvn()` code below. test/hotspot/jtreg/compiler/c2/gvn/DoubleLShiftCrashDuringIGVN.java line 35: > 33: > 34: public class DoubleLShiftCrashDuringIGVN { > 35: public static long shift=0; Suggestion: public static long shift = 0; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2024663634 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2024578423 From epeter at openjdk.org Wed Apr 2 12:17:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 12:17:31 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactor with checkEQWithRawBits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/4ca42699..f2b3c371 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=05-06 Stats: 165 lines in 2 files changed: 10 ins; 53 del; 102 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Wed Apr 2 12:17:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 12:17:31 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 09:11:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - upate copyright >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > I'll have a closer look at the code later again :-) @chhagedorn Ok, I refactored it. I'm now always comparing arbitrary classes. And `checkEQWithRawBits` does the comparison with raw bits, no `Options` required any more. Added a comment about reflection making things slow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2772372587 From chagedorn at openjdk.org Wed Apr 2 12:18:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 12:18:56 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2736051512 From chagedorn at openjdk.org Wed Apr 2 12:22:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 12:22:32 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:53:53 GMT, Christian Hagedorn wrote: >> As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: >> >> After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add ResourceMark Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2772382684 From chagedorn at openjdk.org Wed Apr 2 12:22:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 12:22:33 GMT Subject: Integrated: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian This pull request has now been integrated. Changeset: c9baa8a7 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/c9baa8a7aea0be7221f0af834fe73f035436bd8d Stats: 43 lines in 2 files changed: 43 ins; 0 del; 0 mod 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/24326 From mchevalier at openjdk.org Wed Apr 2 13:22:16 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 13:22:16 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: <4NpAefp1YW42cMcYCYTC2tf2P729CAab54NaSTjBl3Q=.6d74392d-7637-465b-a429-c91c8d010958@github.com> On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static Thanks @chhagedorn, @TobiHartmann and @galderz for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2772540842 From duke at openjdk.org Wed Apr 2 13:22:17 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 13:22:17 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: <1z9DPapYokbDFikUWCbU6TYgL13IPHnxzUM0Qgxmz-A=.0ff1b927-a299-4dd1-96ec-64cdeeda6443@github.com> On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static @marc-chevalier Your change (at version efa712be6f504305ec562c83d2bf048100394fad) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2772543590 From thartmann at openjdk.org Wed Apr 2 13:30:59 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 13:30:59 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:25:53 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 41: >> >>> 39: public class TestContinuationPinningAndEA { >>> 40: public static void main(String[] args) { >>> 41: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.vm=ALL-UNNAMED"); >> >> Why is this only needed now? > > I think it was already not fine. I had the all the `catch` because of the `throws` in the base example, defensively, by default (just assuming that these `throws Throwable` were there for a good reason). It hid the fact that the loading of `Continuation` failed (it's not exported). Nevertheless, I got enough for the IR check to work. I might be wrong on the reason, but my understanding, is that since it's intrinsiced, compilation can manage to produce IR and enough printing to make the check work. And I think it was really working, not just ignored: I got a test failure on the non-inlined cases before I add the `DontInline` annotation, if I remember well. But at runtime, `Continuation` class' access is checked (when loading it?), and then it throws to tell me I'm not allowed to access it. Okay, thanks for the clarification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024830323 From mchevalier at openjdk.org Wed Apr 2 13:31:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 13:31:01 GMT Subject: Integrated: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc This pull request has now been integrated. Changeset: 8608b163 Author: Marc Chevalier Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8608b16341ba2807c6a32f7539d10d7458c40b05 Stats: 124 lines in 1 file changed: 124 ins; 0 del; 0 mod 8348887: Create IR framework test for JDK-8347997 Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24328 From jbhateja at openjdk.org Wed Apr 2 13:51:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 13:51:07 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 00:18:41 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where they are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. >> >> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | >> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | >> | 1 | 1846 | 1925 | 1972 | +4.28 | +6.83 | >> | 2 | 2099 | 1991 | 2016 | -5.15 | -3.95 | >> | 100 | 803 | 1007 | 742 | +25.40 | -7.60 | >> | 1000 | 497 | 635 | 514 | +27.77 | +3.42 | >> | 10000 | 474 | 572 | 477 | +20.68 | +0.63 | >> | 100000 | 473 | 567 | 474 | +19.87 | +0.21 | >> >> For perfo... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation Please add a micro benchmark for different value ranges ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2772623273 From jkarthikeyan at openjdk.org Wed Apr 2 14:04:30 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:04:30 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v7] In-Reply-To: References: Message-ID: <604ss1R67reWyL2d_GggUXb9m0xYiR-zefrBwan9Zjs=.46d46302-8c03-4228-b8bc-c428cb22c7e8@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Implement patch with VectorCastNode::implemented ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/b02408f7..482ddbc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=05-06 Stats: 48 lines in 9 files changed: 2 ins; 41 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Wed Apr 2 14:07:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:07:20 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v8] In-Reply-To: References: Message-ID: <_gqaAP3z7OIe4Bhfjz_UuojuAdSmst13fEEVvU9H_cg=.b6b86bf4-0124-4aa7-bf02-33ad2a98a0e1@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge - Implement patch with VectorCastNode::implemented - Merge branch 'master' into vectorize-subword - Address comments from review, refactor test - Add new conversions to benchmark - Fix some tests that now vectorize - Implement widening and address comments from review - Subword vectorization ------------- Changes: https://git.openjdk.org/jdk/pull/23413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=07 Stats: 305 lines in 14 files changed: 261 ins; 7 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Wed Apr 2 14:12:43 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:12:43 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v9] In-Reply-To: References: Message-ID: <_2r1kKb42b0BDzIXjG9ZrpdK3yC7LqPq7G1K1mDsPHg=.dcdbef7e-3c6f-413d-bfcb-6949b9a45555@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/996eaed0..fc7be77c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=07-08 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From mchevalier at openjdk.org Wed Apr 2 14:15:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 14:15:00 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 11:46:57 GMT, Roland Westrelin wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> not reinventing the wheel > > src/hotspot/share/opto/memnode.cpp line 2214: > >> 2212: if (tkls->offset() == in_bytes(Klass::layout_helper_offset()) && >> 2213: tkls->isa_instklassptr() && // not directly typed as an array >> 2214: !tkls->is_instklassptr()->might_be_an_array() // not the supertype of all T[] (java.lang.Object) or has an interface that is not Serializable or Cloneable > > Could we do the same by using `TypeKlassPtr::maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM)` and define a `TypeAryKlassPtr::BOTTOM` to be a static field for the `array_interfaces`? > > AFAICT, `TypeKlassPtr::maybe_java_subtype_of()` already covers that case so it would avoid some logic duplication. Also in the test above, maybe you could simplify the test a little but by removing `tkls->isa_instklassptr()`? I think it should be TypeAryKlassPtr::BOTTOM->maybe_java_subtype_of(tkls) rather than tkls->maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM) My reasoning: if `TypeAryKlassPtr::BOTTOM` is `java.lang.Object + Cloneable + Serializable` any array is a subtype of that. But so is any class implementing these interfaces. As well as as any `Object` implementing more interfaces. But for these two last cases, we know they cannot be array, which is what we want to know: are we sure it's not an array, or could it be an array? But if we check if `tkls` is a supertype of `java.lang.Object + Cloneable + Serializable`, then it has to be an `Object` (the most general class) and it implements a subset of `Cloneable` and `Serializable`. In this case, it can be an array. If `tkls` is not a super-type of `java.lang.Object + Cloneable + Serializable`, there are 2 cases: - either it is an array type directly (so, I think, in a way or another, we need to check for `is_instklassptr`), and so a fortiori it can be an array type. - it's an instance type and then cannot be an array since there is nothing between array types and `java.lang.Object + Cloneable + Serializable`. I.e. there is no type `T` that is not an array type, that is a super-type of at least one array type and that is not a super-type of `java.lang.Object + Cloneable + Serializable` (that is that is not `java.lang.Object` or that implements at least another interface). In other words, our question is \exists T: T is an array type /\ T <= tkls (where `A <= B` means `A is a subtype of B`) which is equivalent to tkls >= (java.lang.Object + Cloneable + Serializable) / (tkls <= (java.lang.Object + Cloneable + Serializable) /\ tkls is an array type) We can spare the call to `is_instklassptr` by using a virtual method instead or probably other mechanisms, that's an implementation detail. But I think we need to distinguish cases: both `int[]` and `MyClass + Cloneable + Serializable + MyInterface` are sub-types of `java.lang.Object + Cloneable + Serializable` but for one, we can conclude it's definitely an array, and the other, it's definitely not. Without distinguishing cases, the only sound approximation would be to that that everything can be an array (both sub and super types of `java.lang.Object + Cloneable + Serializable`). Does that makes sense? Did I get something wrong? is the `BOTTOM` not what you had in mind? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2024918440 From jkarthikeyan at openjdk.org Wed Apr 2 14:15:19 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:15:19 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v10] In-Reply-To: References: Message-ID: <2T_qgLVG05hbfRLOkrEGthWnoxXpvUGf0T8haKyKiCE=.fa4c75c5-764c-4829-9fcd-bfe12fa4d994@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Fix copyright after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/fc7be77c..36f598a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=08-09 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Wed Apr 2 14:23:07 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:23:07 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v6] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 06:41:15 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into vectorize-subword >> - Address comments from review, refactor test >> - Add new conversions to benchmark >> - Fix some tests that now vectorize >> - Implement widening and address comments from review >> - Subword vectorization > > src/hotspot/share/opto/superwordVTransformBuilder.cpp line 194: > >> 192: >> 193: // If the use and def types are different, emit a cast node >> 194: if (use_bt != def_bt && !p0->is_Convert() && Matcher::is_vector_cast_supported(def_bt, use_bt)) { > > The usual way we check if a vector instruction is implemented is to use `VectorNode::implemented`. Ah, actually there is a `VectorCastNode::implemented`. Why are you not using that one? This is a good point! I've updated the patch to use `VectorCastNode::implemented` instead. I think I didn't see that function originally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2024935268 From jkarthikeyan at openjdk.org Wed Apr 2 14:28:14 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:28:14 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 09:43:14 GMT, Emanuel Peter wrote: >> @eme64 I think it should be good for another look over! I've addressed your review comments in the last commit. >> >> About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet. > > @jaskarth Let me know if there is anything we can help you with here :) @eme64 Apologies for the delay! I've updated the patch to use `VectorCastNode::implemented` as suggested instead of manually implementing the logic, which simplifies the patch and provides implementations on other platforms, which I left out initially as I wasn't familiar with them. Let me know what you think! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2772738167 From jkarthikeyan at openjdk.org Wed Apr 2 14:33:00 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:33:00 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 02:38:09 GMT, Dean Long wrote: >> Hi all, >> This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. >> >> Thoughts and reviews would be appreciated! > > src/hotspot/share/opto/subnode.cpp line 1938: > >> 1936: >> 1937: NativeType lo_abs = uabs(t->_lo); >> 1938: NativeType hi_abs = uabs(t->_hi); > > Converting unsigned to signed is C++ Undefined Behavior, is it not? This is a great point, I believe you're correct that it's UB. We currently do the same logic in the old code as well: https://github.com/openjdk/jdk/blob/a0677d94d8c83a75cee054700e098faa97edca3c/src/hotspot/share/opto/subnode.cpp#L1945-L1947 However, I'm unsure what the best way to solve this would be. Do you happen to have any ideas? Thanks a lot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2024955772 From jkarthikeyan at openjdk.org Wed Apr 2 14:48:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:48:20 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: > Hi all, > This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. > > Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge - Improve AbsNode::Value ------------- Changes: https://git.openjdk.org/jdk/pull/23685/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23685&range=01 Stats: 145 lines in 2 files changed: 136 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23685/head:pull/23685 PR: https://git.openjdk.org/jdk/pull/23685 From jkarthikeyan at openjdk.org Wed Apr 2 14:48:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:48:20 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 09:27:22 GMT, Tobias Hotz wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > test/hotspot/jtreg/compiler/c2/irTests/TestIRAbs.java line 295: > >> 293: public boolean testIntRange3(int in) { >> 294: // [-9, -2] => [2, 9] >> 295: return Math.abs(-((in & 7) + 2)) < 2; > > Not sure if this is in scope for this PR, but `abs(x)` should be idealized into `0 - x` if x <= 0. This seems to be missing at the moment. This is a good observation! I'll do this in a followup patch, to keep this one focused on just the Value() function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2024979524 From mchevalier at openjdk.org Wed Apr 2 14:49:17 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 14:49:17 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v4] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <1RzVI3uVrE2YscRJPUC3KeGoF5pshACXrfZX9fooPAk=.cbcc9de2-c5d4-4f8a-82f3-444f7ee7ae0a@github.com> On Mon, 31 Mar 2025 08:33:42 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code > - guess_exception_from_deopt_reason out of builtin_throw > - Use builtin_throw > - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code > - More exhaustive bench > - Limit inlining of math Exact operations in case of too many deopts I've applied the suggested refactoring. It looks fine to me, tests seems happy, microbench shows similar profile. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2772794493 From mchevalier at openjdk.org Wed Apr 2 14:49:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 14:49:15 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v5] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Apply @iwanowww's refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/80a67a55..34b3b75c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=03-04 Stats: 152 lines in 4 files changed: 71 ins; 57 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From chagedorn at openjdk.org Wed Apr 2 15:02:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 15:02:13 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 12:17:31 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > refactor with checkEQWithRawBits Thanks for the update! It's much easier to use and understand now I think. I did a complete pass and left a lot of comments but mostly minor things. Overall, I think this looks great! :-) test/hotspot/jtreg/compiler/lib/verify/Verify.java line 32: > 30: import java.lang.reflect.InvocationTargetException; > 31: import java.util.HashMap; > 32: import java.util.ArrayList; Seems unused and can be removed. Suggestion: test/hotspot/jtreg/compiler/lib/verify/Verify.java line 60: > 58: private final boolean isFloatCheckWithRawBits; > 59: private final HashMap a2b = new HashMap<>(); > 60: private final HashMap b2a = new HashMap<>(); Can you add a comment here what `a2b` and `b2a` means? See also some other comment further down about `a2b/b2a`, maybe you can share some docs or cross reference. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 67: > 65: > 66: /** > 67: * Verify the content of two Objects, possibly recursively. Maybe add: Suggestion: * Verify the contents of two Objects on a raw bit level, possibly recursively. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 68: > 66: /** > 67: * Verify the content of two Objects, possibly recursively. > 68: * Different NaN encodins are considered non-qual, since we compare Suggestion: * Different NaN encodings are considered non-equal, since we compare test/hotspot/jtreg/compiler/lib/verify/Verify.java line 81: > 79: > 80: /** > 81: * Verify the content of two Objects, possibly recursively. Suggestion: * Verify the contents of two Objects, possibly recursively. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 109: > 107: Class ca = a.getClass(); > 108: Class cb = b.getClass(); > 109: if (ca != cb) { Only seen this in my IDE: `ca` and `cb` should be `Class` instead of the raw `Class` since `getClass()` returns a `Class` (cannot make a suggestion since it's hidden here). test/hotspot/jtreg/compiler/lib/verify/Verify.java line 124: > 122: switch (a) { > 123: case Object[] x -> checkEQimpl(x, (Object[])b, field, aParent, bParent); > 124: case Byte x -> checkEQimpl(x, ((Byte)b).byteValue(), field, aParent, bParent); Can't you just pass `(Byte) b` to rely on auto unboxing instead? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 143: > 141: case Exception x -> checkEQimpl(x, (Exception) b, field, aParent, bParent); > 142: default -> { > 143: if (ca.getName().startsWith("jdk.incubator.vector") && ca.getName().contains("Vector")) { Might be worth to extract this case to own methods and structure it like this to reduce the size of the method: if (vectorClass()) { checkEQForVectorAPIClass(); } else { checkEQdispatch(); } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 160: > 158: } catch (InvocationTargetException e) { > 159: throw new RuntimeException("Could not invoke toArray on " + ca.getName(), e); > 160: } You can merge them: Suggestion: } catch (NoSuchMethodException | IllegalAccessException | InvocationTargetException e) { throw new RuntimeException("Could not invoke toArray on " + ca.getName(), e); } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 187: > 185: private void checkEQimpl(char a, char b, String field, Object aParent, Object bParent) { > 186: if (a != b) { > 187: System.err.println("ERROR: Verify.checkEQ failed: value mismatch: " + (int)a + " vs " + (int)b); Why do you need an upcast here? Same for `short`. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 233: > 231: * of an add or mul (NaN1 * NaN2 does not have same bits as NaN2 * NaN1, because the multiplication > 232: * of two NaN values should always return the first of the two). > 233: * Hence, by default, we pick the non-raw coparison: we verify that we have the same bit Suggestion: * Hence, by default, we pick the non-raw comparison: we verify that we have the same bit test/hotspot/jtreg/compiler/lib/verify/Verify.java line 236: > 234: * pattern in all cases, except for NaN we project to the canonical NaN, using Float.floatToIntBits. > 235: */ > 236: private boolean isFloatEQ(float a, float b) { Shouldn't this be named `isFloatNotEQ` since you return true when they are different? Same for `isDoubleEQ` below. Alternatively: Return true when they are equal (i.e. flip condition). test/hotspot/jtreg/compiler/lib/verify/Verify.java line 242: > 240: > 241: /** > 242: * See comments for "isFloatEQ". We don't have Javadocs for the private methods but it could still help when navigating in the IDE to directly jump the the method when clicking on it (suggested same below for other places): Suggestion: * See comments for {@link #isFloatEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 250: > 248: > 249: /** > 250: * Check that two floats are equal according to "isFloatEQ". Suggestion: * Check that two floats are equal according to {@link #isFloatEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 254: > 252: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { > 253: if (isFloatEQ(a, b)) { > 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); Just noticed this now (there are other places as well): Since we now have `Verify.checkEQ()` and `Verify.checkEQWithRawBits()`, it would improve the readability if we reported which method was used. It could be done with something like that (pseudo code): System.err.println("ERROR: Verify.checkEQ" + withRawBitsString() + " failed: value mismatch. String withRawBitsString() { return isFloatCheckWithRawBits ? "WithRawBits" : ""; } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 256: > 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); > 255: System.err.println(" Values: " + a + " vs " + b); > 256: System.err.println(" Raw: " + Float.floatToRawIntBits(a) + " vs " + Float.floatToRawIntBits(b)); Do we always want to dump the raw bits even when `isFloatCheckWithRawBits` is false? I guess it does not hurt. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 263: > 261: > 262: /** > 263: * Check that two doubles are equal according to "isDoubleEQ". Suggestion: * Check that two doubles are equal according to {@link #isDoubleEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 287: > 285: > 286: /** > 287: * Verify that the content of two MemorySegments is identical. Note: we do not check the Suggestion: * Verify that the contents of two MemorySegments are identical. Note: we do not check the test/hotspot/jtreg/compiler/lib/verify/Verify.java line 316: > 314: * Verify that the content of two MemorySegments is identical. Note: we do not check the > 315: * backing type, only the size and content. > 316: */ Probably a copy-paste error. Should be updated for exceptions. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 333: > 331: > 332: /** > 333: * Verify that the content of two byte arrays is identical. Suggestion: * Verify that the contents of two byte arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 340: > 338: > 339: /** > 340: * Verify that the content of two char arrays is identical. Suggestion: * Verify that the contents of two char arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 347: > 345: > 346: /** > 347: * Verify that the content of two short arrays is identical. Suggestion: * Verify that the contents of two short arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 354: > 352: > 353: /** > 354: * Verify that the content of two int arrays is identical. Suggestion: * Verify that the contents of two int arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 361: > 359: > 360: /** > 361: * Verify that the content of two long arrays is identical. Suggestion: * Verify that the contents of two long arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 368: > 366: > 367: /** > 368: * Check that two float arrays are equal according to "isFloatEQ". Suggestion: * Check that two float arrays are equal according to {@link #isFloatEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 387: > 385: > 386: /** > 387: * Check that two double arrays are equal according to "isDoubleEQ". Suggestion: * Check that two double arrays are equal according to {@link #isDoubleEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 406: > 404: > 405: /** > 406: * Verify that the content of two boolean arrays is identical. Suggestion: * Verify that the contents of two boolean arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 425: > 423: > 424: /** > 425: * Verify that the content of two Object arrays is identical, recursively: Suggestion: * Verify that the contents of two Object arrays are identical, recursively: test/hotspot/jtreg/compiler/lib/verify/Verify.java line 443: > 441: > 442: private void checkEQArbitraryClasses(Object a, Object b) { > 443: Class c = a.getClass(); Suggestion: Class c = a.getClass(); test/hotspot/jtreg/compiler/lib/verify/Verify.java line 447: > 445: for (Field field : c.getDeclaredFields()) { > 446: Object va = null; > 447: Object vb = null; `null` can be omitted: Suggestion: Object va; Object vb; test/hotspot/jtreg/compiler/lib/verify/Verify.java line 463: > 461: private void print(Object a, Object b, String field, Object aParent, Object bParent) { > 462: System.err.println(" aParent: " + aParent); > 463: System.err.println(" bParent: " + bParent); Should we print `null` parents or just skip them? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 481: > 479: case Long x -> { return false; } > 480: case Float x -> { return false; } > 481: case Double x -> { return false; } I think the convention is to us `_` when they are ignored. You can then also merge them: Suggestion: case Boolean _, Byte _, Short _, Character _, Integer _, Long _, Float _, Double _ -> { return false; } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 488: > 486: Object aPrevious = b2a.get(b); > 487: if (aPrevious == null && bPrevious == null) { > 488: // Record for next time. Can you explain, maybe as comment at `checkAlreadyVisited()`, why we want to have these caches? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 520: > 518: long start = Long.max(offset - range, 0); > 519: long end = Long.min(offset + range, a.byteSize()); > 520: for (long i = start; i < end; i++) { Nit below: You can replace `System.err.println("")` with `System.err.println()`. test/hotspot/jtreg/testlibrary_tests/verify/examples/TestWithVectorAPI.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2736390646 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024896216 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024897698 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024901854 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024898499 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024906090 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024917798 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024911572 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024921119 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024921953 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024928403 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024933737 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024970368 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024938566 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024939173 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024948324 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024953601 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024939637 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024954430 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024960438 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024961195 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024961541 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024961800 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024962110 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024962373 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024964374 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024964654 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024971548 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024984751 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024985775 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024988001 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024991410 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024995992 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024999554 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2025002053 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2025002585 From hgreule at openjdk.org Wed Apr 2 16:24:23 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 2 Apr 2025 16:24:23 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes Message-ID: This change implements constant folding for ReverseBytes nodes. Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. I appreciate any reviews and comments. ------------- Commit messages: - Implement constant folding for ReverseBytes*Nodes Changes: https://git.openjdk.org/jdk/pull/24382/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353551 Stats: 204 lines in 3 files changed: 204 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From duke at openjdk.org Wed Apr 2 16:55:56 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 2 Apr 2025 16:55:56 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: <4KYVemCsJx4WaROYdA770DaFipRFOWUmlR-iGMkHkVk=.1d8ddb9b-ec6d-4fb5-828b-bc96c07ac756@github.com> On Wed, 2 Apr 2025 13:47:45 GMT, Jatin Bhateja wrote: > Please add a micro benchmark for different value ranges @jatin-bhateja Should I add different value ranges to the existing tanh micro-benchmark or create a brand new micro-benchmark? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2773179057 From vlivanov at openjdk.org Wed Apr 2 17:13:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 17:13:58 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v5] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 14:49:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Apply @iwanowww's refactoring Looks good. src/hotspot/share/opto/library_call.cpp line 2009: > 2007: if (builtin_throw_too_many_traps(Deoptimization::Reason_intrinsic, > 2008: env()->ArithmeticException_instance())) { > 2009: // It has been already too many times, but we cannot use builtin_throw care (e.g. we care about backtraces), Remove "care" in "builtin_throw care"? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2737016248 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2025260344 From mchevalier at openjdk.org Wed Apr 2 17:20:35 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:20:35 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v2] In-Reply-To: References: Message-ID: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - Fix spacing - Do not eagerly replace shift amounts in nested lshift ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24355/files - new: https://git.openjdk.org/jdk/pull/24355/files/6dcc6c15..d84b3d6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=00-01 Stats: 50 lines in 2 files changed: 21 ins; 1 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24355/head:pull/24355 PR: https://git.openjdk.org/jdk/pull/24355 From mchevalier at openjdk.org Wed Apr 2 17:20:35 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:20:35 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash In-Reply-To: References: Message-ID: <_xMbLzflQQjvjOgEmTpvMb-e3YUAVBbWoztGp802zV8=.ac4dd9a2-c333-401d-88db-db8412753325@github.com> On Tue, 1 Apr 2025 11:51:13 GMT, Marc Chevalier wrote: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc That makes sense. I've done as described. Tests seem happy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2773229850 From mchevalier at openjdk.org Wed Apr 2 17:23:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:23:03 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: fix typo in comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/34b3b75c..238b129d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From mchevalier at openjdk.org Wed Apr 2 17:23:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:23:03 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v5] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:11:00 GMT, Vladimir Ivanov wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply @iwanowww's refactoring > > src/hotspot/share/opto/library_call.cpp line 2009: > >> 2007: if (builtin_throw_too_many_traps(Deoptimization::Reason_intrinsic, >> 2008: env()->ArithmeticException_instance())) { >> 2009: // It has been already too many times, but we cannot use builtin_throw care (e.g. we care about backtraces), > > Remove "care" in "builtin_throw care"? Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2025271377 From jbhateja at openjdk.org Wed Apr 2 17:35:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 17:35:55 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 00:18:41 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where they are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. >> >> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | >> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | >> | 1 | 1846 | 1925 | 1972 | +4.28 | +6.83 | >> | 2 | 2099 | 1991 | 2016 | -5.15 | -3.95 | >> | 100 | 803 | 1007 | 742 | +25.40 | -7.60 | >> | 1000 | 497 | 635 | 514 | +27.77 | +3.42 | >> | 10000 | 474 | 572 | 477 | +20.68 | +0.63 | >> | 100000 | 473 | 567 | 474 | +19.87 | +0.21 | >> >> For perfo... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 331: > 329: __ andl(rdx, rcx); > 330: __ andl(rcx, 32767); > 331: __ cmpl(rcx, 16438); Did you try using "UCOMISD" to directly compare with constant 22.0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2024867196 From vlivanov at openjdk.org Wed Apr 2 18:10:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 18:10:50 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:23:03 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix typo in comment Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2737157109 From jbhateja at openjdk.org Wed Apr 2 18:31:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 18:31:50 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 13:46:47 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation > > src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 331: > >> 329: __ andl(rdx, rcx); >> 330: __ andl(rcx, 32767); >> 331: __ cmpl(rcx, 16438); > > Did you try using "UCOMISD" to directly compare with constant 22.0 [perf_tanh_delimit.txt](https://github.com/user-attachments/files/19573617/perf_tanh_delimit.txt) Proposed sequence in micro2 shows better path length, please give this a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2025378258 From dlong at openjdk.org Wed Apr 2 19:57:03 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 2 Apr 2025 19:57:03 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation Message-ID: This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. ------------- Commit messages: - choose correct NeverBranch successor Changes: https://git.openjdk.org/jdk/pull/24390/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24390&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353041 Stats: 19 lines in 2 files changed: 17 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24390/head:pull/24390 PR: https://git.openjdk.org/jdk/pull/24390 From duke at openjdk.org Wed Apr 2 23:13:49 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 2 Apr 2025 23:13:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: <7CcUcVFlw6z6thTGD2vcAhx9n4yySRRWD_4IhCqbByg=.9f705aca-c7bb-4f00-a52d-ff897d931596@github.com> On Wed, 2 Apr 2025 18:29:34 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 331: >> >>> 329: __ andl(rdx, rcx); >>> 330: __ andl(rcx, 32767); >>> 331: __ cmpl(rcx, 16438); >> >> Did you try using "UCOMISD" to directly compare with constant 22.0 > > [perf_tanh_delimit.txt](https://github.com/user-attachments/files/19573617/perf_tanh_delimit.txt) > > Proposed sequence in micro2 shows better path length, please give this a try. So, I didn't try using "UCOMISD" to directly compare because I thought it wouldn't provide a benefit over the existing approach. Thanks for providing these micros though as I think they prove my suspicions. To explain, I'll start with results from the code you provided on the [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) machine. **./perf_tanh_delimit 21.0** -> _micro1=184 ms, micro2=186 ms_ **./perf_tanh_delimit 22.0** -> _micro1=188 ms, micro2=186 ms_ **./perf_tanh_delimit 23.0** -> _micro1=187 ms, micro2=183 ms_ Please note that results with inputs strictly less than 22.0 aren't really meaningful because they would go into the heavy compute path of the actual implementations. Still, I included one for reference. Here we can see "UCOMISD" shows some improvement over the existing approach. Of course, this uplift will vary on different platforms. Unfortunately, the sequences provided only cover the positive inputs. To get a better picture, we need one that covers both positive and negative inputs. I created one with corresponding results linked below. [perf_tanh_delimit2.txt](https://github.com/user-attachments/files/19576620/perf_tanh_delimit2.txt) **./perf_tanh_delimit2 -23.0** -> _micro1=179 ms, micro2=184 ms_ **./perf_tanh_delimit2 -22.0** -> _micro1=176 ms, micro2=178 ms_ **./perf_tanh_delimit2 -21.0** -> _micro1=185 ms, micro2=181 ms_ **./perf_tanh_delimit2 21.0** -> _micro1=190 ms, micro2=179 ms_ **./perf_tanh_delimit2 22.0** -> _micro1=189 ms, micro2=185 ms_ **./perf_tanh_delimit2 23.0** -> _micro1=187 ms, micro2=185 ms_ Again, the _|x| < 22.0_ inputs aren't relevant because they don't trigger any significant computations. With that in mind, the positive inputs show improvements while the negative inputs don't. The situation would be reversed if I checked for negative input values first. We need two uses of "UCOMISD" to cover positive and negative inputs. Whereas "PEXTRW" is only required once to cover the sign and magnitude. Also, other parts of the intrinsic implementation rely on it, so I don't think we should make those blocks worse by using "UCOMISD" without getting a clear boost from it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2025719541 From jbhateja at openjdk.org Thu Apr 3 01:28:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 01:28:57 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: <7CcUcVFlw6z6thTGD2vcAhx9n4yySRRWD_4IhCqbByg=.9f705aca-c7bb-4f00-a52d-ff897d931596@github.com> References: <7CcUcVFlw6z6thTGD2vcAhx9n4yySRRWD_4IhCqbByg=.9f705aca-c7bb-4f00-a52d-ff897d931596@github.com> Message-ID: On Wed, 2 Apr 2025 23:11:18 GMT, Mohamed Issa wrote: >> [perf_tanh_delimit.txt](https://github.com/user-attachments/files/19573617/perf_tanh_delimit.txt) >> >> Proposed sequence in micro2 shows better path length, please give this a try. > > So, I didn't try using "UCOMISD" to directly compare because I thought it wouldn't provide a benefit over the existing approach. Thanks for providing these micros though as I think they prove my suspicions. To explain, I'll start with results from the code you provided on the [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) machine. > > **./perf_tanh_delimit 21.0** -> _micro1=184 ms, micro2=186 ms_ > **./perf_tanh_delimit 22.0** -> _micro1=188 ms, micro2=186 ms_ > **./perf_tanh_delimit 23.0** -> _micro1=187 ms, micro2=183 ms_ > > Please note that results with inputs strictly less than 22.0 aren't really meaningful because they would go into the heavy compute path of the actual implementations. Still, I included one for reference. Here we can see "UCOMISD" shows some improvement over the existing approach. Of course, this uplift will vary on different platforms. Unfortunately, the sequences provided only cover the positive inputs. To get a better picture, we need one that covers both positive and negative inputs. I created one with corresponding results linked below. > > [perf_tanh_delimit2.txt](https://github.com/user-attachments/files/19576620/perf_tanh_delimit2.txt) > > **./perf_tanh_delimit2 -23.0** -> _micro1=179 ms, micro2=184 ms_ > **./perf_tanh_delimit2 -22.0** -> _micro1=176 ms, micro2=178 ms_ > **./perf_tanh_delimit2 -21.0** -> _micro1=185 ms, micro2=181 ms_ > **./perf_tanh_delimit2 21.0** -> _micro1=190 ms, micro2=179 ms_ > **./perf_tanh_delimit2 22.0** -> _micro1=189 ms, micro2=185 ms_ > **./perf_tanh_delimit2 23.0** -> _micro1=187 ms, micro2=185 ms_ > > Again, the _|x| < 22.0_ inputs aren't relevant because they don't trigger any significant computations. With that in mind, the positive inputs show improvements while the negative inputs don't. The situation would be reversed if I checked for negative input values first. We need two uses of "UCOMISD" to cover positive and negative inputs. Whereas "PEXTRW" is only required once to cover the sign and magnitude. Also, other parts of the intrinsic implementation rely on it, so I don't think we should make those blocks worse by using "UCOMISD" without getting a clear boost from it. Thanks for the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2025855450 From jbhateja at openjdk.org Thu Apr 3 01:41:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 01:41:48 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v2] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Windows build fix - 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/4c0123e7..ff03a06e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=00-01 Stats: 15628 lines in 499 files changed: 9285 ins; 5111 del; 1232 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From duke at openjdk.org Thu Apr 3 02:32:55 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 3 Apr 2025 02:32:55 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). Hi @eme64 , I moved MergeLoads optimization to `addnode.cpp`. Now I use `_combine` for operators which can merge the adjacent loads. In this patch only `OrNode` is supported as `combine` operator, but I think it can be extended to other operator like `AddNode` and `XorNode`. They will be supported in subsequent patch. May I ask you to review it again? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2774212502 From jbhateja at openjdk.org Thu Apr 3 02:58:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 02:58:17 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/ff03a06e..b95ac21c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From chagedorn at openjdk.org Thu Apr 3 05:09:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 05:09:56 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: <2YsQyyHJuiGrgnTRKYADjQhRa5qIaDvCCjLd4kjfdeI=.0134b90b-fa18-4c5b-afb3-f7f4e10d6411@github.com> On Wed, 2 Apr 2025 10:25:32 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix ((x< > src/hotspot/share/opto/addnode.cpp line 509: > >> 507: if (rhs.valid && rhs.variable == n->in(1)) { >> 508: return Multiplication{true, rhs.variable, rhs.multiplier + 1}; >> 509: } > > Hmm, it seems these are patterns that you did not promise you would cover in the description above. > It makes it a little difficult to keep the overview... Just a drive-by comment what might help: Name the cases you cover in the description with `(1)`, (2)` etc. and add the numbers as comments in the code where you cover the patterns. This would support the mapping from description to implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2026196086 From chagedorn at openjdk.org Thu Apr 3 05:21:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 05:21:59 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 17:20:35 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Fix spacing > - Do not eagerly replace shift amounts in nested lshift Nice, thanks for the update! Looks much better now. Some more comments. src/hotspot/share/opto/mulnode.cpp line 953: > 951: } > 952: > 953: //============================================================================= While at it, you can also remove this line which we no longer use today Suggestion: src/hotspot/share/opto/mulnode.cpp line 966: > 964: > 965: // Returns whether the shift amount is constant. If so, sets real_shift and masked_shift. > 966: static bool mask_shift_amount(PhaseGVN* phase, const Node* shiftNode, uint nBits, int& real_shift, int& masked_shift) { While at it, we should probably use underscores instead of camelCase for `shiftNode`. Same below. src/hotspot/share/opto/mulnode.cpp line 995: > 993: if (igvn != nullptr) { > 994: igvn->rehash_node_delayed(shiftNode); > 995: } Do we still need this now? If we always call it with `shiftNode == this` then we already get the rehashing "for free" due to modifying `this` as part of `Ideal()`. src/hotspot/share/opto/mulnode.cpp line 1007: > 1005: // outer_shift = (_ << rhs0) > 1006: // We are looking for the pattern: > 1007: // outer_shift = ((X << rhs1) << rhs0) Just an idea: To better keep track of what is the outer and inner rhs, we could use `rhs_inner` and `rhs_outer`. src/hotspot/share/opto/mulnode.cpp line 1010: > 1008: // where rhs0 and rhs1 are constant > 1009: // we denote inner_shift the nested expression (X << rhs1) > 1010: // con0 = rhs1 % nbits and con0 = rhs1 % nbits Probably copy-paste error, did you want to define `con1` here as well? ------------- PR Review: https://git.openjdk.org/jdk/pull/24355#pullrequestreview-2738524807 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026199599 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026200378 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026202624 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026204009 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026207250 From chagedorn at openjdk.org Thu Apr 3 05:29:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 05:29:12 GMT Subject: Integrated: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian This pull request has now been integrated. Changeset: 8d3d1d41 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/8d3d1d41377cf2162aad374dce4bf7e1bcb8297c Stats: 54 lines in 1 file changed: 17 ins; 8 del; 29 mod 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 Reviewed-by: mchevalier, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24373 From epeter at openjdk.org Thu Apr 3 05:46:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 05:46:30 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v8] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/f2b3c371..8c3e9b91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=06-07 Stats: 35 lines in 2 files changed: 0 ins; 5 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Thu Apr 3 05:53:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 05:53:50 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:01:52 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 60: > >> 58: private final boolean isFloatCheckWithRawBits; >> 59: private final HashMap a2b = new HashMap<>(); >> 60: private final HashMap b2a = new HashMap<>(); > > Can you add a comment here what `a2b` and `b2a` means? See also some other comment further down about `a2b/b2a`, maybe you can share some docs or cross reference. I added some documentation :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 488: > >> 486: Object aPrevious = b2a.get(b); >> 487: if (aPrevious == null && bPrevious == null) { >> 488: // Record for next time. > > Can you explain, maybe as comment at `checkAlreadyVisited()`, why we want to have these caches? Added documentation :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026251090 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026252113 From epeter at openjdk.org Thu Apr 3 05:57:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 05:57:51 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:11:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 109: > >> 107: Class ca = a.getClass(); >> 108: Class cb = b.getClass(); >> 109: if (ca != cb) { > > Only seen this in my IDE: `ca` and `cb` should be `Class` instead of the raw `Class` since `getClass()` returns a `Class` (cannot make a suggestion since it's hidden here). fixed > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 124: > >> 122: switch (a) { >> 123: case Object[] x -> checkEQimpl(x, (Object[])b, field, aParent, bParent); >> 124: case Byte x -> checkEQimpl(x, ((Byte)b).byteValue(), field, aParent, bParent); > > Can't you just pass `(Byte) b` to rely on auto unboxing instead? You are right, simplified it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026257889 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026254841 From epeter at openjdk.org Thu Apr 3 06:05:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:05:52 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:12:59 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 143: > >> 141: case Exception x -> checkEQimpl(x, (Exception) b, field, aParent, bParent); >> 142: default -> { >> 143: if (ca.getName().startsWith("jdk.incubator.vector") && ca.getName().contains("Vector")) { > > Might be worth to extract this case to own methods and structure it like this to reduce the size of the method: > > if (vectorClass()) { > checkEQForVectorAPIClass(); > } else { > checkEQdispatch(); > } Refactored it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026266298 From epeter at openjdk.org Thu Apr 3 06:14:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:14:01 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:16:58 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 187: > >> 185: private void checkEQimpl(char a, char b, String field, Object aParent, Object bParent) { >> 186: if (a != b) { >> 187: System.err.println("ERROR: Verify.checkEQ failed: value mismatch: " + (int)a + " vs " + (int)b); > > Why do you need an upcast here? Same for `short`. Look at this ;) jshell> char a = 66; a ==> 'B' jshell> System.out.println("a: " + a); a: B jshell> System.out.println("a: " + (int)a); a: 66 But I can remove the casts for `short`. > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 254: > >> 252: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { >> 253: if (isFloatEQ(a, b)) { >> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); > > Just noticed this now (there are other places as well): Since we now have `Verify.checkEQ()` and `Verify.checkEQWithRawBits()`, it would improve the readability if we reported which method was used. It could be done with something like that (pseudo code): > > System.err.println("ERROR: Verify.checkEQ" + withRawBitsString() + " failed: value mismatch. > > String withRawBitsString() { > return isFloatCheckWithRawBits ? "WithRawBits" : ""; > } Boah. That is really going to bloat the code, don't you think? The exception that is thrown will already give you the complete stack trace, including which methods were called. Is that not good enough? > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 256: > >> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); >> 255: System.err.println(" Values: " + a + " vs " + b); >> 256: System.err.println(" Raw: " + Float.floatToRawIntBits(a) + " vs " + Float.floatToRawIntBits(b)); > > Do we always want to dump the raw bits even when `isFloatCheckWithRawBits` is false? I guess it does not hurt. Yes, I want that. It can help if there are different `NaN` encodings. Or if we somehow reinterpreted integer values as floats. It's been useful for me in the past :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026269456 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026272974 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026274418 From epeter at openjdk.org Thu Apr 3 06:14:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:14:02 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:06:50 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 187: >> >>> 185: private void checkEQimpl(char a, char b, String field, Object aParent, Object bParent) { >>> 186: if (a != b) { >>> 187: System.err.println("ERROR: Verify.checkEQ failed: value mismatch: " + (int)a + " vs " + (int)b); >> >> Why do you need an upcast here? Same for `short`. > > Look at this ;) > > jshell> char a = 66; > a ==> 'B' > > jshell> System.out.println("a: " + a); > a: B > > jshell> System.out.println("a: " + (int)a); > a: 66 > > > But I can remove the casts for `short`. Added a comment as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026271338 From epeter at openjdk.org Thu Apr 3 06:22:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:22:24 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v9] In-Reply-To: References: Message-ID: <779UjTYbMPKwYJmlILeIwI7WTewAG8XRu4dwzm2UR2E=.c364b368-c22b-4414-b700-9be7c24d0e9f@github.com> > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Updates for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/8c3e9b91..a07f201e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=07-08 Stats: 96 lines in 1 file changed: 51 ins; 15 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Thu Apr 3 06:22:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:22:25 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:57:12 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > Thanks for the update! It's much easier to use and understand now I think. > > I did a complete pass and left a lot of comments but mostly minor things. Overall, I think this looks great! :-) @chhagedorn Thanks for the thorough review :) I think I addressed all your comments ? > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 236: > >> 234: * pattern in all cases, except for NaN we project to the canonical NaN, using Float.floatToIntBits. >> 235: */ >> 236: private boolean isFloatEQ(float a, float b) { > > Shouldn't this be named `isFloatNotEQ` since you return true when they are different? Same for `isDoubleEQ` below. Alternatively: Return true when they are equal (i.e. flip condition). Good catch! Flipped the condition :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 316: > >> 314: * Verify that the content of two MemorySegments is identical. Note: we do not check the >> 315: * backing type, only the size and content. >> 316: */ > > Probably a copy-paste error. Should be updated for exceptions. Good catch, updated it! > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 463: > >> 461: private void print(Object a, Object b, String field, Object aParent, Object bParent) { >> 462: System.err.println(" aParent: " + aParent); >> 463: System.err.println(" bParent: " + bParent); > > Should we print `null` parents or just skip them? I think it does not hurt to print `null` here. It makes the code a little simpler. > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 520: > >> 518: long start = Long.max(offset - range, 0); >> 519: long end = Long.min(offset + range, a.byteSize()); >> 520: for (long i = start; i < end; i++) { > > Nit below: You can replace `System.err.println("")` with `System.err.println()`. done! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2774605365 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026279491 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026276939 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026280168 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026281282 From epeter at openjdk.org Thu Apr 3 06:27:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:27:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v10] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix whitespace issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/a07f201e..752679ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=08-09 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From hgreule at openjdk.org Thu Apr 3 07:32:50 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 07:32:50 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:29:10 GMT, Emanuel Peter wrote: > @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! > > I'm running some testing, please ping me in 24h for the results! Thanks @eme64, did the test go through? I'm wondering now if there should rather be something like an `AddNodeIdealizationTests.java` that contains the optimizations of AddNode::ideal for all(*) its subtypes rather than more specific test classes testing a mix of optimizations from different Ideal methods (e.g., `AddINodeIdealizationTests.java` has a test for `(x + 1) + 2 => x + 3`). I'm not sure if your current work on the template library could somehow cover that (replacing operators, replacing IR check rules). (*) Not all, as there are some subtypes for which the optimizations don't apply. I also noticed that I didn't add the bug id to the test headers here, I'll add them before merging. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774734906 From epeter at openjdk.org Thu Apr 3 07:40:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 07:40:48 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:29:50 GMT, Hannes Greule wrote: >> @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! >> >> I'm running some testing, please ping me in 24h for the results! > >> @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! >> >> I'm running some testing, please ping me in 24h for the results! > > Thanks @eme64, did the test go through? > > I'm wondering now if there should rather be something like an `AddNodeIdealizationTests.java` that contains the optimizations of AddNode::ideal for all(*) its subtypes rather than more specific test classes testing a mix of optimizations from different Ideal methods (e.g., `AddINodeIdealizationTests.java` has a test for `(x + 1) + 2 => x + 3`). I'm not sure if your current work on the template library could somehow cover that (replacing operators, replacing IR check rules). > (*) Not all, as there are some subtypes for which the optimizations don't apply. > > I also noticed that I didn't add the bug id to the test headers here, I'll add them before merging. @SirYwell Something like a `AddINodeIdealizationTests.java` sounds like a good idea. We could systematically cover `Value`, `Ideal` and `Identity`, for every single node. A good structure would really help. But collecting / writing all those IR tests is a lot of work... But we could at least start setting it up, and extend it over time. But that is work for some separate RFE's, I'll discuss it with my co-workers. Not sure if Templates really help. Because the tedious work is capturing all the patterns, and writing IR rules. That's hard to automate I think. But maybe you have some good ideas here :) Well, I suppose some patterns go over multiple types, so there we could do something. And maybe we can still cut a lot of boiler-plate code with Templates, and get a better overview that way... worth thinking about a little more! Tests have passed. I'll wait with approval until you make the updates :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774753603 From hgreule at openjdk.org Thu Apr 3 07:52:21 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 07:52:21 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: References: Message-ID: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: - update license year - add bug id ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24348/files - new: https://git.openjdk.org/jdk/pull/24348/files/f7fb76da..2584807a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24348/head:pull/24348 PR: https://git.openjdk.org/jdk/pull/24348 From epeter at openjdk.org Thu Apr 3 07:52:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 07:52:21 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:49:46 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Thanks for the updates and the fix :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24348#pullrequestreview-2738839095 From hgreule at openjdk.org Thu Apr 3 07:56:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 07:56:56 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:52:21 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Thanks, I'll wait for a second review I guess? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774786364 From epeter at openjdk.org Thu Apr 3 07:56:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 07:56:57 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:53:06 GMT, Hannes Greule wrote: >> Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: >> >> - update license year >> - add bug id > > Thanks, I'll wait for a second review I guess? @SirYwell Ah, yes, for compiler changes we require 2 reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774788937 From chagedorn at openjdk.org Thu Apr 3 07:56:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 07:56:59 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:08:41 GMT, Emanuel Peter wrote: >> Look at this ;) >> >> jshell> char a = 66; >> a ==> 'B' >> >> jshell> System.out.println("a: " + a); >> a: B >> >> jshell> System.out.println("a: " + (int)a); >> a: 66 >> >> >> But I can remove the casts for `short`. > > Added a comment as well. Right, that makes sense for the `char` case. But good that we could remove it for the `short` case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026372872 From chagedorn at openjdk.org Thu Apr 3 07:56:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 07:56:59 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v10] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:27:33 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace issues Thanks for addressing all my comments and doing the updates! I have some more final comments but then I think it's good to go from my side! test/hotspot/jtreg/compiler/lib/verify/Verify.java line 62: > 60: * When comparing arbitrary classes recursively, we need to remember which > 61: * pairs of objects {@code (a, b)} we have already visited. The maps > 62: * {@link a2b} and {@link b2a} track these edges. Caching which pairs I think it's fine to use `code` here since the Javadocs links to itself otherwise. Suggestion: * {@code a2b} and {@code b2a} track these edges. Caching which pairs test/hotspot/jtreg/compiler/lib/verify/Verify.java line 77: > 75: * Verify the contents of two Objects on a raw bit level, possibly recursively. > 76: * Different NaN encodings are considered non-equal, since we compare > 77: * floating number by their raw bits. Suggestion: * floating numbers by their raw bits. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 90: > 88: /** > 89: * Verify the contents of two Objects, possibly recursively. > 90: * Different NaN encodins are considered equal. Suggestion: * Different NaN encodings are considered equal. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 307: > 305: * Verify that two Exceptions have the same message. Messages are not always carried, > 306: * they are often dropped to performance, and that is ok. But if both Exceptions have > 307: * the message, we should compare them. Suggestion: * they are often dropped for performance reasons, and that is okay. But if both Exceptions * have the message, we should compare them. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 438: > 436: * to add "--add-modules=jdk.incubator.vector" to the command-line of every test that uses the Verify > 437: * class. So we hack this via reflection. > 438: */ I think this background is only needed at `checkEQForVectorAPIClass()` (where you already have that comment). Here you can just describe what the code actually does or just drop the comment entirely since the method name is self-explanatory :-) test/hotspot/jtreg/compiler/lib/verify/Verify.java line 495: > 493: * When comparing arbitrary classes recursively, we need to remember which > 494: * pairs of objects {@code (a, b)} we have already visited. The maps > 495: * {@link a2b} and {@link b2a} track these edges. Caching which pairs Suggestion: * {@link #a2b} and {@link #b2a} track these edges. Caching which pairs ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2738785943 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026406773 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026370608 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026371049 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026396345 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026401231 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026407956 From chagedorn at openjdk.org Thu Apr 3 07:57:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 07:57:01 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:10:14 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 254: >> >>> 252: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { >>> 253: if (isFloatEQ(a, b)) { >>> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); >> >> Just noticed this now (there are other places as well): Since we now have `Verify.checkEQ()` and `Verify.checkEQWithRawBits()`, it would improve the readability if we reported which method was used. It could be done with something like that (pseudo code): >> >> System.err.println("ERROR: Verify.checkEQ" + withRawBitsString() + " failed: value mismatch. >> >> String withRawBitsString() { >> return isFloatCheckWithRawBits ? "WithRawBits" : ""; >> } > > Boah. That is really going to bloat the code, don't you think? > The exception that is thrown will already give you the complete stack trace, including which methods were called. Is that not good enough? Hm, it could indeed be a little bit more complicated when you are deep down in a recursion. My thought was that it could be misleading when a test is using a mix of `verifyEQ()` and `verifyEQWithRawBits()` and you only read `verifyEQ` failed. You could be start looking at the wrong check even though the stack trace would have guided you to the correct place. Maybe we can just update "Verify.checkEQ" into something more generic like "Equality matching failed" and we're good. What do you think? >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 256: >> >>> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); >>> 255: System.err.println(" Values: " + a + " vs " + b); >>> 256: System.err.println(" Raw: " + Float.floatToRawIntBits(a) + " vs " + Float.floatToRawIntBits(b)); >> >> Do we always want to dump the raw bits even when `isFloatCheckWithRawBits` is false? I guess it does not hurt. > > Yes, I want that. It can help if there are different `NaN` encodings. Or if we somehow reinterpreted integer values as floats. It's been useful for me in the past :) Sounds good, let's leave it in then! >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 463: >> >>> 461: private void print(Object a, Object b, String field, Object aParent, Object bParent) { >>> 462: System.err.println(" aParent: " + aParent); >>> 463: System.err.println(" bParent: " + bParent); >> >> Should we print `null` parents or just skip them? > > I think it does not hurt to print `null` here. It makes the code a little simpler. Okay, maybe we can print `` in case of a null for more clarity? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026382031 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026386633 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026402954 From epeter at openjdk.org Thu Apr 3 08:09:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 08:09:57 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v11] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/752679ae..ccb8c4b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From mchevalier at openjdk.org Thu Apr 3 08:14:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 08:14:00 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 Thanks @chhagedorn and @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24375#issuecomment-2774834063 From duke at openjdk.org Thu Apr 3 08:14:01 2025 From: duke at openjdk.org (duke) Date: Thu, 3 Apr 2025 08:14:01 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: <39fyYyWXS86ospwWICUF9t7L8fQ1XI6eRbH-6bg68Es=.a5409f85-85eb-400a-8023-58195a648c0a@github.com> On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 @marc-chevalier Your change (at version 48bd2037a9241f4c2956b19e91585553249e2625) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24375#issuecomment-2774836855 From epeter at openjdk.org Thu Apr 3 08:15:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 08:15:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/ccb8c4b7..b8fad69c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=10-11 Stats: 27 lines in 1 file changed: 0 ins; 5 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Thu Apr 3 08:15:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 08:15:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v10] In-Reply-To: References: Message-ID: <_8a_-zsFB8h10ir2y4whdQHDkzHn91yflaplv2o9bZ8=.0259d2a1-2aec-49c0-9d51-6774f9ed41f5@github.com> On Thu, 3 Apr 2025 07:54:24 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespace issues > > Thanks for addressing all my comments and doing the updates! I have some more final comments but then I think it's good to go from my side! @chhagedorn Thanks for having another look! I applied all your suggestions :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2774838773 From chagedorn at openjdk.org Thu Apr 3 08:25:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 08:25:09 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 08:15:36 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian That looks good to me, thanks for bearing with me! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2738937793 From mchevalier at openjdk.org Thu Apr 3 08:41:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 08:41:12 GMT Subject: Integrated: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead In-Reply-To: References: Message-ID: <8GAvhDoJ3ji1WXZCij79wdxfbY3fr7qtbU8yt5swpPg=.0987d06f-b27d-4c90-996b-15f969727577@github.com> On Wed, 2 Apr 2025 07:19:35 GMT, Marc Chevalier wrote: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: > > Then: > > And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. > > > Thanks, > Marc This pull request has now been integrated. Changeset: 00a038e9 Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/00a038e9c559401b7934f30b4719010bb1024291 Stats: 95 lines in 2 files changed: 93 ins; 0 del; 2 mod 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24375 From jbhateja at openjdk.org Thu Apr 3 08:47:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 08:47:01 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: <26OsQVzaWO4t7wkKEnbhxfXerfizKktb-EX3ncNzBKE=.81aabcbe-58ea-4d4e-9c0f-84db4759b676@github.com> On Wed, 2 Apr 2025 06:31:20 GMT, Emanuel Peter wrote: >> Hi @eme64 , >> This specific issues is around special Float16 values i.e +/- 0.0 and NaN. >> I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 >> >> Best Regards, >> Jatin > > @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! ping @eme64, kindly approve if your tests are all green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2774916278 From chagedorn at openjdk.org Thu Apr 3 08:58:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 08:58:49 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:52:21 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Indeed, good catch! Very hard to find these bugs but I'm afraid there is, unfortunately, not much we can do about it instead of having more IR tests to catch these cases. > I'm wondering now if there should rather be something like an `AddNodeIdealizationTests.java` that contains the optimizations of AddNode::ideal for all(*) its subtypes rather than more specific test classes testing a mix of optimizations from different Ideal methods (e.g., `AddINodeIdealizationTests.java` has a test for `(x + 1) + 2 => x + 3`). I would recommend to split it up more to easier find the tests again. I would probably first search for a `Or*Tests.java` instead of looking into `Add*Tests.java` when checking tests for `OrINode`. We can still group together multiple nodes if they only differ in the basic types like `AddI` and `AddL`. But this can still be discussed when such tests are added, which I totally agree with you we should have. Adding tests can be done incrementally and even in small or "not yet completely covering a node with many transformation" batches. > We could systematically cover Value, Ideal and Identity, for every single node That would be great! > Because the tedious work is capturing all the patterns, and writing IR rules. What might help here is when the documentation for the `Ideal()`, `Identity()`, and `Value()` methods would enumerate the different optimizations which we want to check and add the numbers to the method body where it's implemented. That would not only help to map documentation to code and spot potentially missing/wrong promises but also helps with writing clearly map-able tests: We can simply write `testAddINodeCase3b()`, for example instead of `testAddISomeHardToMapOptimizationName()`. The only downside of that: when we change the enumerations, the tests are no longer in sync. But I guess that's an okay price to pay. > But that is work for some separate RFE's Definitely! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24348#pullrequestreview-2739029429 From epeter at openjdk.org Thu Apr 3 09:05:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 09:05:56 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:17:22 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Testing is green :) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7108: > 7106: // dst = max(xtmp1, xtmp2) > 7107: vmaxsh(dst, xtmp1, xtmp2); > 7108: // isNaN = is_unordered_quite(xtmp1) Suggestion: // isNaN = is_unordered_quiet(xtmp1) Does the Q stand for quiet or quite? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7129: > 7127: // dst = min(xtmp1, xtmp2) > 7128: vminsh(dst, xtmp1, xtmp2); > 7129: // isNaN = is_unordered_quite(xtmp1) Suggestion: // isNaN = is_unordered_quiet(xtmp1) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2739047288 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2026532906 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2026533225 From jbhateja at openjdk.org Thu Apr 3 09:25:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:37 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v8] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Type fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/1713057d..0ff84455 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From epeter at openjdk.org Thu Apr 3 09:25:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 09:25:37 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v8] In-Reply-To: References: Message-ID: <789JJASgNvA9D9Fbdgp9p2DO6KOSoU0Uro08KX_QuLk=.61045de2-98f8-47cd-9421-9f161feb30bd@github.com> On Thu, 3 Apr 2025 09:22:23 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Type fixes Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2739096032 From jbhateja at openjdk.org Thu Apr 3 09:25:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:37 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:31:20 GMT, Emanuel Peter wrote: >> Hi @eme64 , >> This specific issues is around special Float16 values i.e +/- 0.0 and NaN. >> I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 >> >> Best Regards, >> Jatin > > @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! Thanks, @eme64 and @sviswa7, for the reviews and approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2775029236 From jbhateja at openjdk.org Thu Apr 3 09:25:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:38 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:01:57 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7129: > >> 7127: // dst = min(xtmp1, xtmp2) >> 7128: vminsh(dst, xtmp1, xtmp2); >> 7129: // isNaN = is_unordered_quite(xtmp1) > > Suggestion: > > // isNaN = is_unordered_quiet(xtmp1) Typo fixed. Thanks Needs re-approval :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2026558934 From jbhateja at openjdk.org Thu Apr 3 09:25:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:38 GMT Subject: Integrated: 8352585: Add special case handling for Float16.max/min x86 backend In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 20:20:24 GMT, Jatin Bhateja wrote: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh This pull request has now been integrated. Changeset: f7a94fee Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/f7a94feedd63775a09d0bcb9ef3313972e2a5d69 Stats: 260 lines in 6 files changed: 254 ins; 6 del; 0 mod 8352585: Add special case handling for Float16.max/min x86 backend Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24169 From mchevalier at openjdk.org Thu Apr 3 09:34:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 09:34:42 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24355/files - new: https://git.openjdk.org/jdk/pull/24355/files/d84b3d6d..7c9ec24a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=01-02 Stats: 25 lines in 1 file changed: 0 ins; 2 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24355/head:pull/24355 PR: https://git.openjdk.org/jdk/pull/24355 From mchevalier at openjdk.org Thu Apr 3 09:34:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 09:34:45 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 05:09:29 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix spacing >> - Do not eagerly replace shift amounts in nested lshift > > src/hotspot/share/opto/mulnode.cpp line 953: > >> 951: } >> 952: >> 953: //============================================================================= > > While at it, you can also remove this line which we no longer use today > Suggestion: Done. > src/hotspot/share/opto/mulnode.cpp line 995: > >> 993: if (igvn != nullptr) { >> 994: igvn->rehash_node_delayed(shiftNode); >> 995: } > > Do we still need this now? If we always call it with `shiftNode == this` then we already get the rehashing "for free" due to modifying `this` as part of `Ideal()`. As discussed, do not remove hash (useless now), but enqueue in worklist. > src/hotspot/share/opto/mulnode.cpp line 1007: > >> 1005: // outer_shift = (_ << rhs0) >> 1006: // We are looking for the pattern: >> 1007: // outer_shift = ((X << rhs1) << rhs0) > > Just an idea: To better keep track of what is the outer and inner rhs, we could use `rhs_inner` and `rhs_outer`. good idea! > src/hotspot/share/opto/mulnode.cpp line 1010: > >> 1008: // where rhs0 and rhs1 are constant >> 1009: // we denote inner_shift the nested expression (X << rhs1) >> 1010: // con0 = rhs1 % nbits and con0 = rhs1 % nbits > > Probably copy-paste error, did you want to define `con1` here as well? indeed. redone with new notations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026584728 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026585502 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026585778 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026586168 From epeter at openjdk.org Thu Apr 3 09:39:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 09:39:52 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 02:38:09 GMT, Dean Long wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > src/hotspot/share/opto/subnode.cpp line 1938: > >> 1936: >> 1937: NativeType lo_abs = uabs(t->_lo); >> 1938: NativeType hi_abs = uabs(t->_hi); > > Converting unsigned to signed is C++ Undefined Behavior, is it not? @dean-long ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2026597497 From duke at openjdk.org Thu Apr 3 09:44:07 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 3 Apr 2025 09:44:07 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 16:36:10 GMT, Martin Doerr wrote: >> Similar to the x86 implementation. The non-product feature for counting things like `SharedRuntime::_unsafe_set_memory_ctr` is currently not supported on PPC64. I've left it commented out. >> >> Before this patch (measured on Power10): >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op >> MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op >> MemorySegmentZeroUnsaf... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Simplify usage of UnsafeMemoryAccessMark. LGTM ------------- Marked as reviewed by dbriemann at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24254#pullrequestreview-2739162483 From mchevalier at openjdk.org Thu Apr 3 10:28:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 10:28:48 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Requested changes done! Ready for more review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2775259754 From chagedorn at openjdk.org Thu Apr 3 10:56:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 10:56:05 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments That looks good to me, thanks for all the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24355#pullrequestreview-2739400194 From qamai at openjdk.org Thu Apr 3 10:59:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 3 Apr 2025 10:59:56 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:37:26 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/subnode.cpp line 1938: >> >>> 1936: >>> 1937: NativeType lo_abs = uabs(t->_lo); >>> 1938: NativeType hi_abs = uabs(t->_hi); >> >> Converting unsigned to signed is C++ Undefined Behavior, is it not? > > @dean-long ? No converting unsigned to signed is not UB, the behaviour is the same as in Java. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2026748511 From thartmann at openjdk.org Thu Apr 3 11:12:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Apr 2025 11:12:06 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Nice refactoring! Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24355#pullrequestreview-2739459799 From hgreule at openjdk.org Thu Apr 3 11:37:07 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 11:37:07 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:52:21 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Thank you for your reviews and comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2775462863 From epeter at openjdk.org Thu Apr 3 11:37:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 11:37:07 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 11:31:28 GMT, Hannes Greule wrote: >> Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: >> >> - update license year >> - add bug id > > Thank you for your reviews and comments :) @SirYwell Thanks again for the work and all the updates :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2775469156 From hgreule at openjdk.org Thu Apr 3 11:37:08 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 11:37:08 GMT Subject: Integrated: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:20:48 GMT, Hannes Greule wrote: > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. This pull request has now been integrated. Changeset: 3ceabf0f Author: Hannes Greule Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3ceabf0f647beb4943c06709aa8797f7511cd48e Stats: 41 lines in 3 files changed: 33 ins; 0 del; 8 mod 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call Reviewed-by: epeter, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24348 From dlunden at openjdk.org Thu Apr 3 11:41:55 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 11:41:55 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v13] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Updates after comments - Tag short-lived register mask arena - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 - Formatting updates - Add register mask fuzzer test - Extend example with offset register mask - Remove accidental leftover #endif - Update - Fix trailing whitespace - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 - ... and 10 more: https://git.openjdk.org/jdk/compare/a1ab1d8d...76f6b8f8 ------------- Changes: https://git.openjdk.org/jdk/pull/20404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=12 Stats: 12649 lines in 31 files changed: 12306 ins; 90 del; 253 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Thu Apr 3 11:46:10 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 11:46:10 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 31 Mar 2025 13:20:12 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend example with offset register mask > > src/hotspot/share/opto/optoreg.hpp line 237: > >> 235: } >> 236: OptoRegPair(OptoReg::Name f) : OptoRegPair(OptoReg::Bad, f) {} >> 237: OptoRegPair() : OptoRegPair(OptoReg::Bad, OptoReg::Bad) {} > > This is preexisting, but since the changeset touches the code: these two "partial" constructors seem unused, please consider removing them (but double-check in that case that they are unused for all platforms). Thanks, removed (and double-checked usage) > src/hotspot/share/opto/regmask.hpp line 545: > >> 543: >> 544: // Overlap test. Non-zero if any registers in common, including all-stack. >> 545: bool overlap(const RegMask &rm) const { > > Please review the frequency of the different tests in this function. I ran an instrumented version and found the test in Case 4 to succeed (return true) more often that Case 2 and Case 3. Thanks, I made a note to run some benchmarks for this and gather statistics. It is critical that we run case 1 first (results in a significant performance gain), but perhaps we can gain a little by ordering the rare cases as well. > src/hotspot/share/utilities/globalDefinitions.hpp line 1363: > >> 1361: // synchronized statements in Java. >> 1362: const int BoxLockNode_slot_limit = 200; >> 1363: > > This definition seems too C2-specific to be put in this shared file, could it be moved e.g. to `optoreg.hpp`? Thanks, I was unsure where to put this definition. It doesn't really relate to `OptoReg` and is rather a limitation for `RegMask`s, so I now simply put it as a constant in `regmask.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026826903 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026826440 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026823410 From thartmann at openjdk.org Thu Apr 3 12:01:59 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Apr 2025 12:01:59 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:23:03 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix typo in comment Took me a while to parse the code but the refactoring definitely improves the situation :slightly_smiling_face: Looks good! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2739594615 From duke at openjdk.org Thu Apr 3 12:10:11 2025 From: duke at openjdk.org (duke) Date: Thu, 3 Apr 2025 12:10:11 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: <-sKm43ONlJn2YNNCUxXjG3p8xL10UGc7m7CF07P-uhA=.cc4d78be-01e7-42d7-b4d7-ad751312745e@github.com> On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments @marc-chevalier Your change (at version 7c9ec24aa81df185e4b5b672d4a92e3a3f2b985f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2775579848 From mchevalier at openjdk.org Thu Apr 3 12:10:09 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 12:10:09 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Thanks @TobiHartmann and @chhagedorn for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2775578176 From swen at openjdk.org Thu Apr 3 12:19:52 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 3 Apr 2025 12:19:52 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> Message-ID: <5Pr4WqnsBZrOfnqUWe-FSZ5UBkkGx5ghH113Jw1eO1Y=.31275627-2eb7-408f-a73f-ff974d993ea4@github.com> On Sat, 29 Mar 2025 07:43:32 GMT, Shaojin Wen wrote: >> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> add StringBuilderUnsafePut > > I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants. > > The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance. > > These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2. > > # 1. Scipt > > git remote add wenshao git at github.com:wenshao/jdk.git > git fetch wenshao > git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f > make test TEST="micro:vm.compiler.MergeStoreBench.str" > > > # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) > > Benchmark Mode Cnt Score Error Units > MergeStoreBench.str4ArraySetConst avgt 5 1338.414 ? 3.209 ns/op > MergeStoreBench.str4Arraycopy avgt 5 7271.203 ? 19.400 ns/op > MergeStoreBench.str4GetBytes avgt 5 6154.684 ? 9.910 ns/op > MergeStoreBench.str4GetChars avgt 5 14078.790 ? 59.175 ns/op > MergeStoreBench.str4StringBuilder avgt 5 15766.528 ? 4634.119 ns/op > MergeStoreBench.str4StringBuilderAppendChar avgt 5 41388.364 ? 9871.409 ns/op > MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1575.792 ? 4.102 ns/op > MergeStoreBench.str4UnsafePut avgt 5 1326.499 ? 2.400 ns/op > MergeStoreBench.str4Utf16ArrayCopy avgt 5 13949.307 ? 1045.255 ns/op > MergeStoreBench.str4Utf16ArraySetConst avgt 5 1511.967 ? 5.250 ns/op > MergeStoreBench.str4Utf16StringBuilder avgt 5 18030.261 ? 1656.463 ns/op > MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 35047.855 ? 16674.635 ns/op > MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 2785.792 ? 5.571 ns/op > MergeStoreBench.str4Utf16UnsafePut avgt 5 1613.812 ? 1.249 ns/op > MergeStoreBench.str5ArraySetConst avgt 5 2599.310 ? 8.667 ns/op > MergeStoreBench.str5Arraycopy avgt 5 9487.926 ? 29.234 ns/op > MergeStoreBench.str5GetBytes avgt 5 5972.453 ? 16.035 ns/op > MergeStoreBench.str5GetChars avgt 5 13516.943 ? 10.978 ns/op > MergeStoreBench.str5StringBuilder avgt 5 16539.070 ? 3097.339 ns/op > MergeStoreBench.str5StringBuilderAppendChar avgt 5 50506.770 ? 11536.41... > @wenshao @iwanowww I have a few concerns about this PR. > > Your current PR description says this: > > > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. > > Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? > > Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. > > I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: [#24108 (comment)](https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069) > > You say this: > > > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. > > Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. > > And like I asked in previously: > > > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? > > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. > > To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do more. But I am a Java programmer, not good at C++ and assembly. I don?t know how to investigate the details. Can you give me some suggestions? I don?t know the details of the optimizer yet, and I can?t provide IR tests. This benchmark and the performance numbers of the results prove that there is a lot of room for performance improvement in the copy of constant String and byte[]. As you said, this does not look like MergeStore, but should be a constant copy optimization. I can separate this into a separate Benchmark. Can you give me some suggestions on the name of the Benchmark? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2775585230 From swen at openjdk.org Thu Apr 3 12:19:57 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 3 Apr 2025 12:19:57 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: <6EU6RBqEH5NAwR5RQFE4ynVoZ4cartnNM37ZJ98fq8k=.54e1cd98-ccd1-4189-acb8-74a76c713cce@github.com> On Wed, 2 Apr 2025 06:46:41 GMT, Emanuel Peter wrote: >> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> add StringBuilderUnsafePut > > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 693: > >> 691: } >> 692: BH.consume(off); >> 693: } > > This is a copy pattern, not MergeStores. As above, STR_4 is a string constant of length 4. Can it be optimized to write a long? > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 735: > >> 733: } >> 734: BH.consume(off); >> 735: } > > @wenshao This is a copy pattern. Not a MergeStore pattern. So I can tell you already now that it will not be optimized by MergeStores ;) If STR_4_BYTES_UTF16 is a StableValue, is it possible to optimize to writing a long? > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 799: > >> 797: } >> 798: BH.consume(off); >> 799: } > > @wenshao Why would MergeStores work here? This is is a copy pattern. That is not at all covered by MergeStores. This is a constant of length 5. Can it be optimized to write a combination of int + byte? > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 856: > >> 854: } >> 855: BH.consume(sb.length()); >> 856: } > > Why would you expect MergeStores to work here? STR_5 is a string constant with a length of 5. Is it possible to optimize it into an implementation similar to str5StringBuilderUnsafePut? The performance can be greatly improved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026870739 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026869185 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026866952 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026875472 From mchevalier at openjdk.org Thu Apr 3 12:26:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 12:26:01 GMT Subject: Integrated: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 11:51:13 GMT, Marc Chevalier wrote: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc This pull request has now been integrated. Changeset: 296d9d6f Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/296d9d6f7a734cc2bab21c58f21a941150b4cf2a Stats: 113 lines in 2 files changed: 79 ins; 3 del; 31 mod 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24355 From dlunden at openjdk.org Thu Apr 3 12:45:00 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:45:00 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 31 Mar 2025 13:24:09 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend example with offset register mask > > src/hotspot/share/opto/postaloc.cpp line 686: > >> 684: assert(!(!value[ureg_lo] && lrgs(useidx).mask().is_offset() && >> 685: !lrgs(useidx).mask().Member(ureg_lo)), >> 686: "invalid assumption"); > > Could you use more descriptive names and assertion messages in this new assertion and the one below? Ideally, without having to refer to old versions. What is the invariant that we want to check? How does it relate to the surrounding code? As we've previously discussed offline, I also had my doubts when introducing these asserts. I've now had a second look (with reasonably fresh eyes), and believe I now better understand the underlying assumptions. The two problematic pieces of code in `postaloc.cpp` from before this changeset that we need to translate as part of the changeset are if (!value[ureg_lo] && (!RegMask::can_represent(ureg_lo) || lrgs(useidx).mask().Member(ureg_lo))) { // Nearly always adjacent and if( RegMask::can_represent(nreg_lo) && // Either a spill slot, or !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent Specifically, the `RegMask::can_represent` calls check if their argument registers can fit in the statically determined size of register masks (which no longer makes sense in this changeset). The reason for the `can_represent` calls is that the subsequent `Member` calls assert internally that their arguments can fit within the static size of register masks. That is, `can_represent` worked as a guard to ensure the precondition for the call to `Member` holds. In this changeset, the `Member` function is generalized to allow arbitrary arguments (and the interal assert is removed). Therefore, we can remove the `can_represent` guards. Now to the assertions that I added (which I've now improved). From the if conditions, we can infer there is an implicit invariant that a register for which `can_represent` returns false is necessarily "adjacent". Specifically, `can_represent` returning false implies that the register is a spill slot (implied by a comment in the source code). However, registers for which `can_represent` returns true may **also** be spill splots, so using `can_represent` as a proxy check for spill slots feels clumsy. I believe that the real invariant here is that only actual registers (and not stack locations, including spill slots) can be non-adjacent. This is what I now verify with my updated asserts. For the record, I have not been able to find any cases with non-adjacency in any tests on current Oracle-supported platforms. From another comment in the source code, it looks like non-adjacent pairs are quite specific to SPARC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026915863 From roland at openjdk.org Thu Apr 3 12:45:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 12:45:19 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v8] In-Reply-To: References: Message-ID: <76hE1JDzr58tKpurf2reT_tDoLdXHiTujP-SeD9HjrA=.550f3465-c929-4aba-bbff-9d74b6793f0e@github.com> > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/node.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/1ec2177a..2abd3054 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=06-07 Stats: 3 lines in 2 files changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Thu Apr 3 12:45:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 12:45:19 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> Message-ID: On Tue, 1 Apr 2025 08:42:12 GMT, Christian Hagedorn wrote: >> The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. > > I assume that JDK-8275202 also calls this method with a non-null `PhaseIdealLoop` pointer? Now we only pass in null, so the `loop` parameter could be removed. Right. Do you think it's better to remove the parameter that's used (for now)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2026910749 From mli at openjdk.org Thu Apr 3 12:50:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 12:50:13 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb Message-ID: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Hi, Can you help to review this patch? Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. int shift = 2; byte b = 83; byte res = (byte) (b << shift | b >> -shift); // res = 76 // but a real left rotate of 83 should be 77 ?? ``` So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. Thanks! ------------- Commit messages: - merge master - initial commit Changes: https://git.openjdk.org/jdk/pull/24414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353600 Stats: 25 lines in 1 file changed: 4 ins; 21 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24414/head:pull/24414 PR: https://git.openjdk.org/jdk/pull/24414 From dlunden at openjdk.org Thu Apr 3 12:54:08 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:54:08 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: <02SMTBF5t1QrQ7zGvm6zSyN4JUDIPDYwztAIzdynOqg=.ad1cacd5-451f-43d9-992c-831172214d6a@github.com> On Tue, 1 Apr 2025 16:00:46 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > src/hotspot/share/opto/chaitin.cpp line 1533: > >> 1531: // hesitation). >> 1532: if (OptoReg::is_valid(reg2) && >> 1533: OptoReg::is_reg(reg2 - lrg.mask().offset_bits())) { > > I agree that this was probably an oversight in the original code. For simplicity I suggest to replace the check with just `OptoReg::is_reg(reg2)` as you suggest, explicitly limiting the scope of the alternation heuristic to physical registers. I compared the overall effectiveness of post-allocation copy removal (as summarized by `-XX:+PrintOptoStatistics`) between this changeset and your proposed simplification and I cannot see any significant difference. I really wonder if the entire alternation heuristic really has any positive measurable effect, but that investigation belongs to another RFE. Thanks for comparing! Now changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026931949 From dlunden at openjdk.org Thu Apr 3 12:54:15 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:54:15 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v13] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:02:07 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: >> >> - Updates after comments >> - Tag short-lived register mask arena >> - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 >> - Formatting updates >> - Add register mask fuzzer test >> - Extend example with offset register mask >> - Remove accidental leftover #endif >> - Update >> - Fix trailing whitespace >> - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 >> - ... and 10 more: https://git.openjdk.org/jdk/compare/a1ab1d8d...76f6b8f8 > > src/hotspot/share/opto/chaitin.cpp line 1591: > >> 1589: // will be a no-op. (Later on, if lrg runs out of possible colors in >> 1590: // its chunk, a new chunk of color may be tried, in which case >> 1591: // examination of neighbors is started again, at retry_next_chunk.) > > Doesn't the second part of the comment (`(Later on...)`) still apply after the changes? Thanks, good catch. Now restored. > src/hotspot/share/opto/matcher.cpp line 148: > >> 146: C->record_method_not_compilable("unsupported incoming calling sequence"); >> 147: return OptoReg::Bad; >> 148: } > > Please consider removing the failure polls after calling `warp_incoming_stk_arg`, I believe the removal of this bailout makes them unnecessary. Thanks, I've removed the polls after `warp_incoming_stk_arg` and also after `warp_outgoing_stk_arg`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026932391 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026930405 From dlunden at openjdk.org Thu Apr 3 12:54:17 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:54:17 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 31 Mar 2025 13:33:52 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend example with offset register mask > > src/hotspot/share/opto/matcher.cpp line 195: > >> 193: if (C->failing()) { >> 194: return; >> 195: } > > Is this failure poll required after your changes? Yes, this poll is still required. We may fail in `init_spill_mask -> regmask_for_ideal_register`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026927964 From epeter at openjdk.org Thu Apr 3 12:58:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 12:58:58 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: <5Pr4WqnsBZrOfnqUWe-FSZ5UBkkGx5ghH113Jw1eO1Y=.31275627-2eb7-408f-a73f-ff974d993ea4@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> <5Pr4WqnsBZrOfnqUWe-FSZ5UBkkGx5ghH113Jw1eO1Y=.31275627-2eb7-408f-a73f-ff974d993ea4@github.com> Message-ID: <9pEJ26ThuVTSmGKiviqYDfJZeMazGf7x_4m6CaUCeQY=.79a0c1a0-beec-4965-94e0-2c60e892fd15@github.com> On Thu, 3 Apr 2025 12:09:27 GMT, Shaojin Wen wrote: >> I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants. >> >> The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance. >> >> These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2. >> >> # 1. Scipt >> >> git remote add wenshao git at github.com:wenshao/jdk.git >> git fetch wenshao >> git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f >> make test TEST="micro:vm.compiler.MergeStoreBench.str" >> >> >> # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) >> >> Benchmark Mode Cnt Score Error Units >> MergeStoreBench.str4ArraySetConst avgt 5 1338.414 ? 3.209 ns/op >> MergeStoreBench.str4Arraycopy avgt 5 7271.203 ? 19.400 ns/op >> MergeStoreBench.str4GetBytes avgt 5 6154.684 ? 9.910 ns/op >> MergeStoreBench.str4GetChars avgt 5 14078.790 ? 59.175 ns/op >> MergeStoreBench.str4StringBuilder avgt 5 15766.528 ? 4634.119 ns/op >> MergeStoreBench.str4StringBuilderAppendChar avgt 5 41388.364 ? 9871.409 ns/op >> MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1575.792 ? 4.102 ns/op >> MergeStoreBench.str4UnsafePut avgt 5 1326.499 ? 2.400 ns/op >> MergeStoreBench.str4Utf16ArrayCopy avgt 5 13949.307 ? 1045.255 ns/op >> MergeStoreBench.str4Utf16ArraySetConst avgt 5 1511.967 ? 5.250 ns/op >> MergeStoreBench.str4Utf16StringBuilder avgt 5 18030.261 ? 1656.463 ns/op >> MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 35047.855 ? 16674.635 ns/op >> MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 2785.792 ? 5.571 ns/op >> MergeStoreBench.str4Utf16UnsafePut avgt 5 1613.812 ? 1.249 ns/op >> MergeStoreBench.str5ArraySetConst avgt 5 2599.310 ? 8.667 ns/op >> MergeStoreBench.str5Arraycopy avgt 5 9487.926 ? 29.234 ns/op >> MergeStoreBench.str5GetBytes avgt 5 5972.453 ? 16.035 ns/op >> MergeStoreBench.str5GetChars avgt 5 13516.943 ? 10.978 ns/op >> MergeStoreBench.str5StringBuilder avgt 5 16539.070 ? 3097.339 ns/op >> MergeSt... > >> @wenshao @iwanowww I have a few concerns about this PR. >> >> Your current PR description says this: >> >> > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works >> >> First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. >> >> Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? >> >> Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. >> >> I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: [#24108 (comment)](https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069) >> >> You say this: >> >> > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. >> >> Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. >> >> And like I asked in previously: >> >> > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? >> > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. >> >> To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) > > The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do... @wenshao > The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do more. Thanks for the compliment :) What I am saying is that this is most likely not the same optimization, and you would have to investigate what other optimizations are relevant here. > But I am a Java programmer, not good at C++ and assembly. I don?t know how to investigate the details. Can you give me some suggestions? Fair enough :) Maybe it's time to learn more about C++ and assembly then :) If you are interested in learning more about the C2 internals, I recommend you read my blog series: https://eme64.github.io/blog/2024/12/24/Intro-to-C2-Part00.html > I don?t know the details of the optimizer yet, and I can?t provide IR tests. There are lots of `@IR` tests in the repository, so you can just do what they did ;) > I don?t know the details of the optimizer yet, and I can?t provide IR tests. This benchmark and the performance numbers of the results prove that there is a lot of room for performance improvement in the copy of constant String and byte[]. > As you said, this does not look like MergeStore, but should be a constant copy optimization. I can separate this into a separate Benchmark. Can you give me some suggestions on the name of the Benchmark? Hmm. Well before we know a good name, we must know what is the relevant optimization for the patterns. I recommend that you find out what the general form of these patterns are, and what optimization steps would have to be taken. Then we can continue. My blog posts will help you get started, so that you can look at the IR and the generated assembly. Feel free to post your findings here, and then maybe I can help you a little on the way. I'm sorry, I am really very busy working on other projects and cannot do all that work for you ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2775704308 From dlunden at openjdk.org Thu Apr 3 12:59:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:59:14 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:16:36 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > src/hotspot/share/opto/chaitin.cpp line 1655: > >> 1653: // Bump register mask up to next stack chunk >> 1654: bool success = lrg->rollover(); >> 1655: if (!success) { > > Was this scenario (running out of stack slots representable in `OptoRegPairs`) possible before, or was it prevented by some check removed in the changeset? Did you come across it in some compilation or is it more of a "theoretical" guard? Yes, it is a theoretical guard (also see the discussions earlier in this PR) and could also happen before this changeset if we roll over too much in `Select`. I experimented a bit with this earlier on and was not able to construct an example where we end up in this situation. > src/hotspot/share/opto/regmask.hpp line 282: > >> 280: _grow(src._rm_size, false); >> 281: memcpy(_RM_UP_EXT, src._RM_UP_EXT, >> 282: sizeof(uintptr_t) * (src._rm_size - _RM_SIZE)); > > This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. Now added! > src/hotspot/share/opto/regmask.hpp line 293: > >> 291: _hwm = _rm_max(); >> 292: } >> 293: _set_range(src._rm_size, value, _rm_size - src._rm_size); > > This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. Now added! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026941355 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026941803 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026941926 From mchevalier at openjdk.org Thu Apr 3 13:01:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 13:01:15 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Remove useless flags in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/238b129d..e7c8f3e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=05-06 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From mchevalier at openjdk.org Thu Apr 3 13:01:16 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 13:01:16 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:23:03 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix typo in comment I've made the test flags tighter as discussed offline. I'll need a fresh approval. And for completeness, there are the bench result on this last state. We can see that things behave as we expect: builtin_throw is taken and making the situation a lot better. When intrinsics or builtin_throw are disabled, we see C1-like perfs. Benchmark (SIZE) Mode Cnt Score Error Units MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.616 ? 7.813 ms/op MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 654.971 ? 573.250 ms/op MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.398 ? 0.274 ms/op MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 629.620 ? 41.181 ms/op MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 2.048 ? 0.340 ms/op MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 681.702 ? 63.721 ms/op MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 3.057 ? 13.688 ms/op MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 660.457 ? 295.393 ms/op MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 2.531 ? 13.692 ms/op MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 647.970 ? 65.451 ms/op MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 5.350 ? 25.080 ms/op MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 681.097 ? 72.604 ms/op MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.552 ? 3.145 ms/op MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 648.402 ? 62.995 ms/op MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.501 ? 0.720 ms/op MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 701.498 ? 47.948 ms/op MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.074 ? 0.949 ms/op MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 665.143 ? 537.941 ms/op MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 5.487 ? 7.165 ms/op MathExact.C1_1.loopNegateLOverflow 1000000 avgt 3 687.085 ? 20.738 ms/op MathExact.C1_1.loopSubtractIInBounds 1000000 avgt 3 1.329 ? 0.769 ms/op MathExact.C1_1.loopSubtractIOverflow 1000000 avgt 3 683.922 ? 70.434 ms/op MathExact.C1_1.loopSubtractLInBounds 1000000 avgt 3 1.384 ? 0.386 ms/op MathExact.C1_1.loopSubtractLOverflow 1000000 avgt 3 664.380 ? 480.847 ms/op MathExact.C1_2.loopAddIInBounds 1000000 avgt 3 1.862 ? 0.815 ms/op MathExact.C1_2.loopAddIOverflow 1000000 avgt 3 660.421 ? 506.723 ms/op MathExact.C1_2.loopAddLInBounds 1000000 avgt 3 1.829 ? 0.221 ms/op MathExact.C1_2.loopAddLOverflow 1000000 avgt 3 681.209 ? 78.976 ms/op MathExact.C1_2.loopDecrementIInBounds 1000000 avgt 3 3.533 ? 11.302 ms/op MathExact.C1_2.loopDecrementIOverflow 1000000 avgt 3 682.639 ? 225.392 ms/op MathExact.C1_2.loopDecrementLInBounds 1000000 avgt 3 3.402 ? 1.031 ms/op MathExact.C1_2.loopDecrementLOverflow 1000000 avgt 3 697.283 ? 306.867 ms/op MathExact.C1_2.loopIncrementIInBounds 1000000 avgt 3 3.326 ? 5.072 ms/op MathExact.C1_2.loopIncrementIOverflow 1000000 avgt 3 658.514 ? 636.731 ms/op MathExact.C1_2.loopIncrementLInBounds 1000000 avgt 3 3.718 ? 0.422 ms/op MathExact.C1_2.loopIncrementLOverflow 1000000 avgt 3 693.863 ? 49.201 ms/op MathExact.C1_2.loopMultiplyIInBounds 1000000 avgt 3 1.924 ? 2.800 ms/op MathExact.C1_2.loopMultiplyIOverflow 1000000 avgt 3 609.308 ? 94.814 ms/op MathExact.C1_2.loopMultiplyLInBounds 1000000 avgt 3 3.459 ? 0.625 ms/op MathExact.C1_2.loopMultiplyLOverflow 1000000 avgt 3 713.503 ? 556.995 ms/op MathExact.C1_2.loopNegateIInBounds 1000000 avgt 3 3.195 ? 0.726 ms/op MathExact.C1_2.loopNegateIOverflow 1000000 avgt 3 684.176 ? 27.164 ms/op MathExact.C1_2.loopNegateLInBounds 1000000 avgt 3 3.483 ? 0.947 ms/op MathExact.C1_2.loopNegateLOverflow 1000000 avgt 3 656.284 ? 582.286 ms/op MathExact.C1_2.loopSubtractIInBounds 1000000 avgt 3 1.728 ? 0.315 ms/op MathExact.C1_2.loopSubtractIOverflow 1000000 avgt 3 688.029 ? 25.201 ms/op MathExact.C1_2.loopSubtractLInBounds 1000000 avgt 3 1.941 ? 0.169 ms/op MathExact.C1_2.loopSubtractLOverflow 1000000 avgt 3 694.341 ? 339.431 ms/op MathExact.C1_3.loopAddIInBounds 1000000 avgt 3 3.122 ? 0.910 ms/op MathExact.C1_3.loopAddIOverflow 1000000 avgt 3 688.731 ? 308.210 ms/op MathExact.C1_3.loopAddLInBounds 1000000 avgt 3 5.492 ? 36.236 ms/op MathExact.C1_3.loopAddLOverflow 1000000 avgt 3 697.053 ? 229.958 ms/op MathExact.C1_3.loopDecrementIInBounds 1000000 avgt 3 9.155 ? 72.182 ms/op MathExact.C1_3.loopDecrementIOverflow 1000000 avgt 3 708.458 ? 788.701 ms/op MathExact.C1_3.loopDecrementLInBounds 1000000 avgt 3 6.402 ? 3.658 ms/op MathExact.C1_3.loopDecrementLOverflow 1000000 avgt 3 705.992 ? 213.542 ms/op MathExact.C1_3.loopIncrementIInBounds 1000000 avgt 3 7.699 ? 61.434 ms/op MathExact.C1_3.loopIncrementIOverflow 1000000 avgt 3 697.353 ? 105.457 ms/op MathExact.C1_3.loopIncrementLInBounds 1000000 avgt 3 6.380 ? 0.839 ms/op MathExact.C1_3.loopIncrementLOverflow 1000000 avgt 3 669.240 ? 522.870 ms/op MathExact.C1_3.loopMultiplyIInBounds 1000000 avgt 3 3.225 ? 0.140 ms/op MathExact.C1_3.loopMultiplyIOverflow 1000000 avgt 3 624.811 ? 457.059 ms/op MathExact.C1_3.loopMultiplyLInBounds 1000000 avgt 3 6.110 ? 1.265 ms/op MathExact.C1_3.loopMultiplyLOverflow 1000000 avgt 3 718.460 ? 68.166 ms/op MathExact.C1_3.loopNegateIInBounds 1000000 avgt 3 6.085 ? 1.430 ms/op MathExact.C1_3.loopNegateIOverflow 1000000 avgt 3 675.036 ? 341.177 ms/op MathExact.C1_3.loopNegateLInBounds 1000000 avgt 3 9.410 ? 93.522 ms/op MathExact.C1_3.loopNegateLOverflow 1000000 avgt 3 652.042 ? 166.119 ms/op MathExact.C1_3.loopSubtractIInBounds 1000000 avgt 3 3.432 ? 11.899 ms/op MathExact.C1_3.loopSubtractIOverflow 1000000 avgt 3 654.208 ? 120.258 ms/op MathExact.C1_3.loopSubtractLInBounds 1000000 avgt 3 5.166 ? 38.529 ms/op MathExact.C1_3.loopSubtractLOverflow 1000000 avgt 3 691.094 ? 80.676 ms/op MathExact.C2.loopAddIInBounds 1000000 avgt 3 2.276 ? 1.750 ms/op MathExact.C2.loopAddIOverflow 1000000 avgt 3 1.173 ? 1.392 ms/op MathExact.C2.loopAddLInBounds 1000000 avgt 3 0.985 ? 0.167 ms/op MathExact.C2.loopAddLOverflow 1000000 avgt 3 1.990 ? 5.310 ms/op MathExact.C2.loopDecrementIInBounds 1000000 avgt 3 2.072 ? 0.173 ms/op MathExact.C2.loopDecrementIOverflow 1000000 avgt 3 1.911 ? 0.288 ms/op MathExact.C2.loopDecrementLInBounds 1000000 avgt 3 1.845 ? 0.424 ms/op MathExact.C2.loopDecrementLOverflow 1000000 avgt 3 2.757 ? 27.268 ms/op MathExact.C2.loopIncrementIInBounds 1000000 avgt 3 2.136 ? 0.517 ms/op MathExact.C2.loopIncrementIOverflow 1000000 avgt 3 2.199 ? 4.024 ms/op MathExact.C2.loopIncrementLInBounds 1000000 avgt 3 1.957 ? 0.365 ms/op MathExact.C2.loopIncrementLOverflow 1000000 avgt 3 2.053 ? 0.779 ms/op MathExact.C2.loopMultiplyIInBounds 1000000 avgt 3 1.174 ? 0.941 ms/op MathExact.C2.loopMultiplyIOverflow 1000000 avgt 3 1.971 ? 10.040 ms/op MathExact.C2.loopMultiplyLInBounds 1000000 avgt 3 0.997 ? 0.318 ms/op MathExact.C2.loopMultiplyLOverflow 1000000 avgt 3 2.847 ? 4.548 ms/op MathExact.C2.loopNegateIInBounds 1000000 avgt 3 4.783 ? 2.454 ms/op MathExact.C2.loopNegateIOverflow 1000000 avgt 3 1.915 ? 0.009 ms/op MathExact.C2.loopNegateLInBounds 1000000 avgt 3 2.824 ? 28.297 ms/op MathExact.C2.loopNegateLOverflow 1000000 avgt 3 4.766 ? 32.627 ms/op MathExact.C2.loopSubtractIInBounds 1000000 avgt 3 0.990 ? 0.264 ms/op MathExact.C2.loopSubtractIOverflow 1000000 avgt 3 1.181 ? 2.120 ms/op MathExact.C2.loopSubtractLInBounds 1000000 avgt 3 2.363 ? 1.575 ms/op MathExact.C2.loopSubtractLOverflow 1000000 avgt 3 2.429 ? 7.120 ms/op MathExact.C2_no_builtin_throw.loopAddIInBounds 1000000 avgt 3 1.040 ? 0.181 ms/op MathExact.C2_no_builtin_throw.loopAddIOverflow 1000000 avgt 3 580.950 ? 112.050 ms/op MathExact.C2_no_builtin_throw.loopAddLInBounds 1000000 avgt 3 1.223 ? 5.700 ms/op MathExact.C2_no_builtin_throw.loopAddLOverflow 1000000 avgt 3 585.712 ? 61.699 ms/op MathExact.C2_no_builtin_throw.loopDecrementIInBounds 1000000 avgt 3 2.114 ? 0.663 ms/op MathExact.C2_no_builtin_throw.loopDecrementIOverflow 1000000 avgt 3 604.866 ? 578.502 ms/op MathExact.C2_no_builtin_throw.loopDecrementLInBounds 1000000 avgt 3 2.167 ? 9.268 ms/op MathExact.C2_no_builtin_throw.loopDecrementLOverflow 1000000 avgt 3 621.175 ? 225.858 ms/op MathExact.C2_no_builtin_throw.loopIncrementIInBounds 1000000 avgt 3 1.950 ? 0.326 ms/op MathExact.C2_no_builtin_throw.loopIncrementIOverflow 1000000 avgt 3 633.735 ? 830.255 ms/op MathExact.C2_no_builtin_throw.loopIncrementLInBounds 1000000 avgt 3 2.397 ? 11.911 ms/op MathExact.C2_no_builtin_throw.loopIncrementLOverflow 1000000 avgt 3 627.599 ? 141.709 ms/op MathExact.C2_no_builtin_throw.loopMultiplyIInBounds 1000000 avgt 3 1.167 ? 1.187 ms/op MathExact.C2_no_builtin_throw.loopMultiplyIOverflow 1000000 avgt 3 623.224 ? 298.374 ms/op MathExact.C2_no_builtin_throw.loopMultiplyLInBounds 1000000 avgt 3 0.944 ? 0.743 ms/op MathExact.C2_no_builtin_throw.loopMultiplyLOverflow 1000000 avgt 3 658.380 ? 137.021 ms/op MathExact.C2_no_builtin_throw.loopNegateIInBounds 1000000 avgt 3 2.119 ? 0.642 ms/op MathExact.C2_no_builtin_throw.loopNegateIOverflow 1000000 avgt 3 643.102 ? 452.213 ms/op MathExact.C2_no_builtin_throw.loopNegateLInBounds 1000000 avgt 3 2.036 ? 0.862 ms/op MathExact.C2_no_builtin_throw.loopNegateLOverflow 1000000 avgt 3 586.103 ? 26.173 ms/op MathExact.C2_no_builtin_throw.loopSubtractIInBounds 1000000 avgt 3 2.552 ? 3.677 ms/op MathExact.C2_no_builtin_throw.loopSubtractIOverflow 1000000 avgt 3 635.294 ? 217.034 ms/op MathExact.C2_no_builtin_throw.loopSubtractLInBounds 1000000 avgt 3 1.093 ? 1.685 ms/op MathExact.C2_no_builtin_throw.loopSubtractLOverflow 1000000 avgt 3 661.541 ? 1358.199 ms/op MathExact.C2_no_intrinsics.loopAddIInBounds 1000000 avgt 3 2.185 ? 15.103 ms/op MathExact.C2_no_intrinsics.loopAddIOverflow 1000000 avgt 3 831.812 ? 1260.546 ms/op MathExact.C2_no_intrinsics.loopAddLInBounds 1000000 avgt 3 2.145 ? 0.088 ms/op MathExact.C2_no_intrinsics.loopAddLOverflow 1000000 avgt 3 709.930 ? 658.722 ms/op MathExact.C2_no_intrinsics.loopDecrementIInBounds 1000000 avgt 3 2.288 ? 0.950 ms/op MathExact.C2_no_intrinsics.loopDecrementIOverflow 1000000 avgt 3 646.879 ? 186.231 ms/op MathExact.C2_no_intrinsics.loopDecrementLInBounds 1000000 avgt 3 1.894 ? 0.421 ms/op MathExact.C2_no_intrinsics.loopDecrementLOverflow 1000000 avgt 3 641.577 ? 323.040 ms/op MathExact.C2_no_intrinsics.loopIncrementIInBounds 1000000 avgt 3 2.027 ? 0.249 ms/op MathExact.C2_no_intrinsics.loopIncrementIOverflow 1000000 avgt 3 657.092 ? 229.818 ms/op MathExact.C2_no_intrinsics.loopIncrementLInBounds 1000000 avgt 3 3.220 ? 16.992 ms/op MathExact.C2_no_intrinsics.loopIncrementLOverflow 1000000 avgt 3 603.468 ? 73.240 ms/op MathExact.C2_no_intrinsics.loopMultiplyIInBounds 1000000 avgt 3 1.295 ? 0.413 ms/op MathExact.C2_no_intrinsics.loopMultiplyIOverflow 1000000 avgt 3 593.005 ? 576.291 ms/op MathExact.C2_no_intrinsics.loopMultiplyLInBounds 1000000 avgt 3 1.093 ? 0.916 ms/op MathExact.C2_no_intrinsics.loopMultiplyLOverflow 1000000 avgt 3 618.956 ? 554.204 ms/op MathExact.C2_no_intrinsics.loopNegateIInBounds 1000000 avgt 3 2.035 ? 0.047 ms/op MathExact.C2_no_intrinsics.loopNegateIOverflow 1000000 avgt 3 650.591 ? 1248.923 ms/op MathExact.C2_no_intrinsics.loopNegateLInBounds 1000000 avgt 3 3.505 ? 20.475 ms/op MathExact.C2_no_intrinsics.loopNegateLOverflow 1000000 avgt 3 660.686 ? 201.612 ms/op MathExact.C2_no_intrinsics.loopSubtractIInBounds 1000000 avgt 3 1.109 ? 0.726 ms/op MathExact.C2_no_intrinsics.loopSubtractIOverflow 1000000 avgt 3 670.468 ? 475.269 ms/op MathExact.C2_no_intrinsics.loopSubtractLInBounds 1000000 avgt 3 1.208 ? 0.806 ms/op MathExact.C2_no_intrinsics.loopSubtractLOverflow 1000000 avgt 3 597.522 ? 32.465 ms/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2775707480 From roland at openjdk.org Thu Apr 3 13:06:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 13:06:49 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: <5c7yEX837btOgbGnTKNn8a7hlPljZRwh0TpgZI6Ogb0=.1c7f3aed-8e8c-4efe-beed-68ea192bcb99@github.com> On Wed, 2 Apr 2025 14:11:34 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/memnode.cpp line 2214: >> >>> 2212: if (tkls->offset() == in_bytes(Klass::layout_helper_offset()) && >>> 2213: tkls->isa_instklassptr() && // not directly typed as an array >>> 2214: !tkls->is_instklassptr()->might_be_an_array() // not the supertype of all T[] (java.lang.Object) or has an interface that is not Serializable or Cloneable >> >> Could we do the same by using `TypeKlassPtr::maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM)` and define a `TypeAryKlassPtr::BOTTOM` to be a static field for the `array_interfaces`? >> >> AFAICT, `TypeKlassPtr::maybe_java_subtype_of()` already covers that case so it would avoid some logic duplication. Also in the test above, maybe you could simplify the test a little but by removing `tkls->isa_instklassptr()`? > > I think it should be > > TypeAryKlassPtr::BOTTOM->maybe_java_subtype_of(tkls) > > rather than > > tkls->maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM) > > > My reasoning: if `TypeAryKlassPtr::BOTTOM` is `java.lang.Object + Cloneable + Serializable` any array is a subtype of that. But so is any class implementing these interfaces. As well as as any `Object` implementing more interfaces. But for these two last cases, we know they cannot be array, which is what we want to know: are we sure it's not an array, or could it be an array? > > But if we check if `tkls` is a supertype of `java.lang.Object + Cloneable + Serializable`, then it has to be an `Object` (the most general class) and it implements a subset of `Cloneable` and `Serializable`. In this case, it can be an array. If `tkls` is not a super-type of `java.lang.Object + Cloneable + Serializable`, there are 2 cases: > - either it is an array type directly (so, I think, in a way or another, we need to check for `is_instklassptr`), and so a fortiori it can be an array type. > - it's an instance type and then cannot be an array since there is nothing between array types and `java.lang.Object + Cloneable + Serializable`. I.e. there is no type `T` that is not an array type, that is a super-type of at least one array type and that is not a super-type of `java.lang.Object + Cloneable + Serializable` (that is that is not `java.lang.Object` or that implements at least another interface). > > In other words, our question is > > \exists T: T is an array type /\ T <= tkls > > (where `A <= B` means `A is a subtype of B`) which is equivalent to > > tkls >= (java.lang.Object + Cloneable + Serializable) > / (tkls <= (java.lang.Object + Cloneable + Serializable) /\ tkls is an array type) > > > We can spare the call to `is_instklassptr` by using a virtual method instead or probably other mechanisms, that's an implementation detail. But I think we need to distinguish cases: both `int[]` and `MyClass + Cloneable + Serializable + MyInterface` are sub-types of `java.lang.Object + Cloneable + Serializable` but for one, we can conclude it's definitely an array, and the other, it's definitely not. Without distinguishing cases, the only sound approximation would be to that that everything can be an array (both sub and super types of `java.lang.Object + Cloneable + Serializable`). > > Does that makes sense? Did I get something wrong? is the `BOTTOM` not what you had in mind? Yes, what I suggested doesn't work indeed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2026954565 From thartmann at openjdk.org Thu Apr 3 13:13:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Apr 2025 13:13:51 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <5fCOI-cNWoRD89POiHnHraJaiy_73Hlt1xZCNGLcHrY=.aebffb74-c8e5-475e-a853-3576673d6161@github.com> On Thu, 3 Apr 2025 13:01:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless flags in tests Marked as reviewed by thartmann (Reviewer). Great, thank you! ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2739795916 PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2775743849 From mli at openjdk.org Thu Apr 3 13:41:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 13:41:20 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v2] In-Reply-To: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. > > int shift = 2; > byte b = 83; > byte res = (byte) (b << shift | b >> -shift); // res = 76 > // but a real left rotate of 83 should be 77 ?? > ``` > > So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. > > A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24414/files - new: https://git.openjdk.org/jdk/pull/24414/files/ca44d6b8..fa5e7375 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=00-01 Stats: 41 lines in 1 file changed: 17 ins; 20 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24414/head:pull/24414 PR: https://git.openjdk.org/jdk/pull/24414 From mli at openjdk.org Thu Apr 3 13:49:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 13:49:05 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v3] In-Reply-To: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. > > int shift = 2; > byte b = 83; > byte res = (byte) (b << shift | b >> -shift); // res = 76 > // but a real left rotate of 83 should be 77 ?? > ``` > > So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. > > A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. > > The vector instruction behaviour is different from java language spec, so seems there is no way to do it for now. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24414/files - new: https://git.openjdk.org/jdk/pull/24414/files/fa5e7375..592d4270 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24414/head:pull/24414 PR: https://git.openjdk.org/jdk/pull/24414 From dfenacci at openjdk.org Thu Apr 3 14:10:34 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 3 Apr 2025 14:10:34 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Message-ID: This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 While running IGVN this could be misinterpreted as non-MH late-inline https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 ### Testing Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) ------------- Commit messages: - JDK-8352963: generate specific MH late if needed when delaying inlining - 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Changes: https://git.openjdk.org/jdk/pull/24402/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24402&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352963 Stats: 104 lines in 7 files changed: 49 ins; 3 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/24402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24402/head:pull/24402 PR: https://git.openjdk.org/jdk/pull/24402 From roland at openjdk.org Thu Apr 3 14:12:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 14:12:52 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:47:37 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/phaseX.cpp line 1836: > >> 1834: _type_nodes.push(n); >> 1835: } >> 1836: const Type* new_type = n->Value(this); > > Could we also only add `n` to `_type_nodes` if `new_type` is top? Then we could also rename `_type_nodes` to `_maybe_top_type_nodes` or something like that. if `new_type` is top? As node's types are widen by CCP, a node `n` will initially be `top`, then one input changes and becomes not `top` but if the node has another input (say control), that other input will still be `top` so the type will be `top` again. Only once both inputs are not `top` is the type not `top`. So isn't there a good chance that most type nodes will initially be `top` and be enqueued anyway so filtering nodes when they are popped is still required and we don't gain much by doing what you suggest? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2027086883 From roland at openjdk.org Thu Apr 3 14:30:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 14:30:21 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v8] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/9b21648d..a76839de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Thu Apr 3 14:30:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 14:30:23 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:23:49 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8341976 >> - review >> - review >> - Merge branch 'master' into JDK-8341976 >> - -XX:+TraceLoopOpts fix >> - review >> - more >> - Merge branch 'master' into JDK-8341976 >> - more >> - ... and 6 more: https://git.openjdk.org/jdk/compare/90c6006f...9b21648d > > test/hotspot/jtreg/compiler/arraycopy/TestSunkLoadAntiDependency.java line 28: > >> 26: * @bug 8341976 >> 27: * @summary C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure >> 28: * @run main/othervm -XX:-BackgroundCompilation TestSunkLoadAntiDependency > > Would it make sense to have a run without any flags? @eme64 I made that change in new commit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2027128101 From mli at openjdk.org Thu Apr 3 17:02:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 17:02:22 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java Message-ID: Hi, Can you help to review this patch? The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. Tested on both x86 and riscv64. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24421/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353665 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24421/head:pull/24421 PR: https://git.openjdk.org/jdk/pull/24421 From duke at openjdk.org Thu Apr 3 17:04:02 2025 From: duke at openjdk.org (Johannes Graham) Date: Thu, 3 Apr 2025 17:04:02 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:16:26 GMT, Emanuel Peter wrote: >> Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. > > @j3graham I gave it a quick look, and it looks even better now. Let me run testing again before you integrate! > > Please ping me in 24h for the results! Hi @eme64, any news on test results? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2776423931 From kvn at openjdk.org Thu Apr 3 17:09:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Apr 2025 17:09:01 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: References: Message-ID: <8UOZXyMUssuNga9jUwBf6F1Nmhi6a3ZIJGpXzS3KL3U=.50774170-672e-49c9-8527-077914838e94@github.com> On Fri, 28 Mar 2025 17:09:19 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` src/hotspot/cpu/x86/x86.ad line 2680: > 2678: break; > 2679: case Op_VecX: > 2680: #ifndef _LP64 Here and in following code you left code for 32-bit instead of 64-bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24300#discussion_r2027426985 From kvn at openjdk.org Thu Apr 3 17:18:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Apr 2025 17:18:59 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v3] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Wed, 2 Apr 2025 08:56:20 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor whitespace reverts Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24301#pullrequestreview-2740633488 From kvn at openjdk.org Thu Apr 3 17:27:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Apr 2025 17:27:58 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 17:09:19 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` I looked on `adlc` code to make sure nothing left there and found check for `IA32`. This looks like another variable we set for 32-bit x86: [platform.m4#L556](https://github.com/openjdk/jdk/blob/master/make/autoconf/platform.m4#L556) I surprise to see `X32` too which we check in `os_linux.cpp`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24300#issuecomment-2776475220 From jbhateja at openjdk.org Thu Apr 3 18:33:36 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 18:33:36 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v10] In-Reply-To: References: Message-ID: <_ZuUmN2CJEVZwNDql7bfQJ8gsXRsIsgOOg6AWNdWzVE=.c267d502-c53b-4ea6-afb4-0415d67ef5ac@github.com> > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Review comment resolutions - Some re-factoring - Adding tests for new float16 Generator - Removing Generator dependency on incubation module - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating benchmark - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating copyright - ... and 3 more: https://git.openjdk.org/jdk/compare/d894b781...6d05863d ------------- Changes: https://git.openjdk.org/jdk/pull/22755/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=09 Stats: 1165 lines in 23 files changed: 1077 ins; 12 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From dlong at openjdk.org Thu Apr 3 19:15:06 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Apr 2025 19:15:06 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 10:57:06 GMT, Quan Anh Mai wrote: >> @dean-long ? > > No converting unsigned to signed is not UB, the behaviour is the same as in Java. I believe it's actually implementation-defined, not UB, until C++ 20, according to discussion in this other PR: https://github.com/openjdk/jdk/pull/24184#discussion_r2011464234 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2027594672 From duke at openjdk.org Thu Apr 3 20:11:08 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 3 Apr 2025 20:11:08 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value Message-ID: Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value`,` return TypeClass::make(static_cast(dividend % divisor))` is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation, there was no check for modulo by zero. The fix proposed checks if there is modulo by zero and throws exception at runtime. ------------- Commit messages: - JDK-8351660: C2: SIGFPE in unsigned_mod_value Changes: https://git.openjdk.org/jdk/pull/24410/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351660 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24410/head:pull/24410 PR: https://git.openjdk.org/jdk/pull/24410 From dlong at openjdk.org Thu Apr 3 21:05:52 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Apr 2025 21:05:52 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: <4BGh_S5KBKdofXCOmj6e7HYCR4GUSi9-ShxqW-h4oNQ=.2cb7d760-4f42-4e95-b993-39f99931c1d9@github.com> References: <4BGh_S5KBKdofXCOmj6e7HYCR4GUSi9-ShxqW-h4oNQ=.2cb7d760-4f42-4e95-b993-39f99931c1d9@github.com> Message-ID: On Wed, 19 Feb 2025 16:13:06 GMT, Chen Liang wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > src/hotspot/share/opto/subnode.cpp line 1941: > >> 1939: >> 1940: if (lo_abs < 0) { >> 1941: assert(lo_abs == std::numeric_limits::min(), "uabs(t->_lo) must be min value if negative!"); > > I think asserting `t->_lo` to be min is more straightforward, and also indicates `(t->_lo) + 1`, which yields max, is in the type. We can simplify the comment below too. If we check for the problematic t->_lo == min first, then we no longer need to use uabs(), right? Also, could we use IntegerType::MIN for the check, rather than std::numeric_limits? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2027739138 From sparasa at openjdk.org Fri Apr 4 01:21:34 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Apr 2025 01:21:34 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same Message-ID: The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. ------------- Commit messages: - 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same Changes: https://git.openjdk.org/jdk/pull/24431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351994 Stats: 3561 lines in 4 files changed: 1376 ins; 298 del; 1887 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From jbhateja at openjdk.org Fri Apr 4 02:10:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Apr 2025 02:10:35 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v11] In-Reply-To: References: Message-ID: <0oYqgnHHKaYHu_AH2bVR2ZbC45JgK-evjGeFwuN0MSg=.94374b61-e094-499f-95af-a1bfbb70db4d@github.com> > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding missing feature check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/6d05863d..2c09e816 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=09-10 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From fyang at openjdk.org Fri Apr 4 02:46:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Apr 2025 02:46:59 GMT Subject: RFR: 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn Message-ID: Hi, please review this small change fixing two jtreg tests. This issue menifests after https://github.com/openjdk/jdk/pull/24344 which auto detests and enables Zvkn extension. The two tests only requires "aes" feature string (vm.cpu.features ~= ".*aes.*"). But the feature string is "zvkn" for linux-riscv64 platform. This adapts "@requires" of both tests considering the Zvkn feature of this platform. Both tests works as expected with qemu-system which is equipped with the Zvkn extension. ------------- Commit messages: - 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn Changes: https://git.openjdk.org/jdk/pull/24433/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24433&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353695 Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24433.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24433/head:pull/24433 PR: https://git.openjdk.org/jdk/pull/24433 From jbhateja at openjdk.org Fri Apr 4 03:07:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Apr 2025 03:07:52 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 01:15:36 GMT, Srinivas Vamsi Parasa wrote: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. src/hotspot/cpu/x86/assembler_x86.cpp line 13825: > 13823: return (!no_flags && dst_enc == nds_enc); > 13824: } > 13825: @vamsi-parasa , We are missing a case where dst_enc can be equal to src_enc; in that case, we can still demote EVEX to REX/REX2 encoding, along with a change in primary opcode if needed. This will apply to all the commutative operations (ADD/ AND / OR / XOR) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2028015376 From epeter at openjdk.org Fri Apr 4 05:35:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Apr 2025 05:35:54 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments Testing looks good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2741819043 From epeter at openjdk.org Fri Apr 4 06:06:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Apr 2025 06:06:56 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v10] In-Reply-To: <2T_qgLVG05hbfRLOkrEGthWnoxXpvUGf0T8haKyKiCE=.fa4c75c5-764c-4829-9fcd-bfe12fa4d994@github.com> References: <2T_qgLVG05hbfRLOkrEGthWnoxXpvUGf0T8haKyKiCE=.fa4c75c5-764c-4829-9fcd-bfe12fa4d994@github.com> Message-ID: On Wed, 2 Apr 2025 14:15:19 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright after merge This looks really good, thanks @jaskarth for the work you put in! I have a few more comments below. src/hotspot/share/opto/superword.cpp line 2329: > 2327: // Check if the output type of def is compatible with the input type of use, i.e. if the > 2328: // types have the same size. > 2329: bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, Node* def, const uint def_size) const { Suggestion: bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, Node* def, const uint pack_size) const { I think that would be more descriptive here. It would indicate we are not interested in the size of an element (i.e. bytes per element), but the size of the pack. src/hotspot/share/opto/superword.cpp line 2361: > 2359: > 2360: // Input sizes differ, but platform supports a cast to change the def shape to the use shape > 2361: Suggestion: // Subword cast: Element sizes differ, but the platform supports a cast to change the def shape to the use shape. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 195: > 193: // If the use and def types are different, emit a cast node > 194: if (use_bt != def_bt && !p0->is_Convert() > 195: && (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack->size(), def_bt, use_bt)) { Suggestion: if (use_bt != def_bt && !p0->is_Convert() && (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack->size(), def_bt, use_bt)) { Optional nit :) test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 113: > 111: tests.put("testShortToInt", () -> { return testShortToInt(aS.clone(), bI.clone()); }); > 112: tests.put("testByteToInt", () -> { return testByteToInt(aB.clone(), bI.clone()); }); > 113: tests.put("testByteToShort", () -> { return testByteToShort(aB.clone(), bS.clone()); }); What about a `testLongToShort` etc? It could be good to just have casts from/to all types, just to be sure ;) test/micro/org/openjdk/bench/vm/compiler/VectorSubword.java line 44: > 42: private byte[] bytes; > 43: private short[] shorts; > 44: private int[] ints; It would be nice if you covered also `char` and `long`, for completeness :) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-2739175207 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2026615573 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028138291 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028139632 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028142164 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028145292 From epeter at openjdk.org Fri Apr 4 06:18:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Apr 2025 06:18:51 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 09:29:47 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Verify.Options refactor for Galder >> - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - clean up test >> - JDK-8352869 > > Changes requested by galder (Author). @galderz do you intend to review / approve this, or should I ask someone else? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2777652144 From mchevalier at openjdk.org Fri Apr 4 06:54:53 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 4 Apr 2025 06:54:53 GMT Subject: RFR: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Thu, 3 Apr 2025 13:01:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless flags in tests Thanks @iwanowww and @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2777705340 From duke at openjdk.org Fri Apr 4 06:54:53 2025 From: duke at openjdk.org (duke) Date: Fri, 4 Apr 2025 06:54:53 GMT Subject: RFR: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Thu, 3 Apr 2025 13:01:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless flags in tests @marc-chevalier Your change (at version e7c8f3e06f46e85cb3c2dc974db84b10a57bd086) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2777706873 From duke at openjdk.org Fri Apr 4 06:55:00 2025 From: duke at openjdk.org (duke) Date: Fri, 4 Apr 2025 06:55:00 GMT Subject: Withdrawn: 8348556: Inlining fails earlier for MemorySegment::reinterpret In-Reply-To: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> Message-ID: On Wed, 5 Feb 2025 10:17:09 GMT, Per Minborg wrote: > This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. > > There are also some changes in other classes (notably `j.l.Object`) which, if implemented, can take us four additional levels of inlining. However, there is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. > > So, we should discuss which of the proposed changes (if any), we'd like to integrate. > > Tested and passed tier1-3 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23460 From thartmann at openjdk.org Fri Apr 4 07:20:48 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 4 Apr 2025 07:20:48 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: <8rHea6mVCPjHzORjnOx2pJG7l8TjH6yHDyF9eXMM0t0=.324f1eec-21f5-4adb-8dc0-ed1ac1348117@github.com> On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Where does the extra SubF come from? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2777755042 From roland at openjdk.org Fri Apr 4 07:28:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 4 Apr 2025 07:28:52 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:00:02 GMT, Christian Hagedorn wrote: >> There's a good chance that it can never be null. I think it's been considered good practice over the year to be particularly defensive about this (there must be other Ideal transformations where inputs can be cleared as the graph is transformed) and I tend to add checks for null inputs systematically. > >> I think it's been considered good practice over the year to be particularly defensive about this > > Makes sense from a stability point of view. I'm wondering though if it's not a bug when the cast input is null at this point. Aren't there only few CFG nodes, like regions, where we set some inputs to null already? There is other code, for example in `ConvI2L::Ideal()`, that later accesses `in(1)` without null check: > > https://github.com/openjdk/jdk/blob/1ec2177a6b25573732b902f76bb81dd1cdaf7edf/src/hotspot/share/opto/convertnode.cpp#L728 > > To be consistent, we would also need to add a check for the other accesses in the method or turn the null check into a bailout for the entire `Ideal()` method. If we agree that null is unexpected (or assume it should be), we might also want to add asserts accordingly. > > My concern is that most IGVN methods assume non-control inputs cannot be null where we normally expect a sane input. This is probably true but hard to prove. To be overally consistent, we should also consider adding bailout and assertion code there. While it's the safest solution, this could introduce a lot of new code, especially for multi input nodes, which also makes it harder to read. What are your thought about that? > > Anyway, we don't need to make a decision as part of this PR on how we should generally handle inputs in IGVN method. It's fine if we only concentrate on the touched/new code here. I agree that we would need to be consistent and that it makes little sense to add null checks in code that have been around forever and has never caused issues. Maybe we can somehow have igvn itself assert that every node it processes has a set of expected inputs non null? I suppose, every node type would need to define which of its inputs can be null, then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2028262990 From duke at openjdk.org Fri Apr 4 08:25:56 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 08:25:56 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 59: > 57: // performed as float operations. > 58: @IR(counts = { IRNode.SUB, "2" }, applyIfPlatform = {"riscv64", "false"}) > 59: @IR(counts = { IRNode.SUB, ">= 2" }, applyIfPlatform = {"riscv64", "true"}) Would it perhaps make sense to fix the number of `SubNode`s or does the `Float16` code add a bunch of them on RISC-V? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24421#discussion_r2028346592 From rcastanedalo at openjdk.org Fri Apr 4 08:43:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 08:43:57 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes Message-ID: This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) #### Testing - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). ------------- Commit messages: - Dump oopmaps for MachSafePoint nodes when available Changes: https://git.openjdk.org/jdk/pull/24422/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24422&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353669 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24422.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24422/head:pull/24422 PR: https://git.openjdk.org/jdk/pull/24422 From mli at openjdk.org Fri Apr 4 09:27:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:27:53 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks 026 lh R28, [R11, #12] # short, #@loadS ! Field: jdk/incubator/vector/Float16.value (constant) 02a NullCheck R11 02a B2: # out( B5 B3 ) <- in( B1 ) Freq: 0.999999 02a + -- // R23=Thread::current(), empty, #@tlsLoadP 02a ld R10, [R23, #464] # ptr, #@loadP 02e + fmv.h.x F2, zr # float, #@loadConH0 032 + fmv.h.x F0, R28 036 + ld R7, [R23, #480] # ptr, #@loadP 03a + addi R28, R10, #16 # ptr, #@addP_reg_imm 03e + binop_hf F3, F2, F0 042 + bgeu R28, R7, B5 #@cmpP_branch P=0.000100 C=-1.000000 046 B3: # out( B4 ) <- in( B2 ) Freq: 0.999899 046 + mv R7, #1 # long, #@loadConL 048 + sd R28, [R23, #464] # ptr, #@storeP 04c + mv R29, narrowklass: precise jdk/incubator/vector/Float16: 0x00007fa4dc396ba8 (java/io/Serializable,java/lang/Comparable):Constant:exact * # compressed klass ptr, #@loadConNKlass 058 + sd R7, [R10] # long, #@storeL 05c + sw R29, [R10, #8] # compressed klass ptr, #@storeNKlass 060 + prefetch_w [R28, #192] # Prefetch for write 064 + sw zr, [R10, #12] # int, #@storeimmI0 068 B4: # out( N1 ) <- in( B6 B3 ) Freq: 0.999999 068 + fmv.h.x F0, zr # float, #@loadConH0 06c + binop_hf F0, F0, F3 070 + fmv.x.h R28, F0 074 + sh R28, [R10, #12] # short, #@storeC 078 078 + MEMBAR-store-store #@membar_storestore 078 + # checkcastPP of R10, #@checkCastPP 078 # pop frame 48 add sp, sp, #48 ld ra, [sp,#-16] ld fp, [sp,#-8] # test polling word ld t0, [xthread,#40] bgtu sp, t0, #slow_path 08a + ret // return register, #@Ret 08c B5: # out( B8 B6 ) <- in( B2 ) Freq: 0.000100016 08c + fmv.w.x F0, zr # float, #@loadConF0 090 + convHF2SAndHF2F F2, F3 094 + fsub.s F0, F0, F2 #@subF_reg_reg 098 spill F3 -> [sp, #0] # spill size = 32 09c + spill F0 -> [sp, #4] # spill size = 32 0a0 + mv R11, precise jdk/incubator/vector/Float16: 0x00007fa4dc396ba8 (java/io/Serializable,java/lang/Comparable):Constant:exact * # ptr, #@loadConP 0b8 CALL,static 0x00007fa4cb85ba80 #@CallStaticJavaDirect wrapper for: C2 Runtime new_instance # jdk.incubator.vector.Float16::valueOf @ bci:0 (line 329) L[0]=sp + #4 # jdk.incubator.vector.Float16::subtract @ bci:9 (line 1137) L[0]=_ L[1]=_ # compiler.floatingpoint.TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 62) L[0]=_ # OopMap {off=196/0xc4} 0d0 B6: # out( B4 ) <- in( B5 ) Freq: 0.000100014 # Block is sole successor of call 0d0 + spill [sp, #0] -> F3 # spill size = 32 0d4 + j B4 #@branch Thanks for having a look, interesting! Please check the B5 block (which is when TLAB run out I think), there is an *extra* `fsub.s` putting value into F0, but F0 value is not really used, as in B4 block it just loads zero into F0. This fsub.s should be useless, although it should do no harness in the sense of correctness. I checked the x86, find out I don't have the CPU feature to generate real float16 instructions, so it only has 2 SubF rather than SubHF which I think is expected for Float16. Not sure if this useless `fsub.s` is only an issue on riscv or maybe also on x86 if `supports_avx512_fp16` return true. It will be great if someone can help to verify it. Also not sure if this useless `extra` instruction (here it's `fsub.s`) could be generated in other situations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778059148 From rcastanedalo at openjdk.org Fri Apr 4 09:48:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 09:48:08 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Thu, 3 Apr 2025 12:48:31 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/matcher.cpp line 195: >> >>> 193: if (C->failing()) { >>> 194: return; >>> 195: } >> >> Is this failure poll required after your changes? > > Yes, this poll is still required. We may fail in `init_spill_mask -> regmask_for_ideal_register`. Good catch, thanks for checking. >> src/hotspot/share/opto/postaloc.cpp line 686: >> >>> 684: assert(!(!value[ureg_lo] && lrgs(useidx).mask().is_offset() && >>> 685: !lrgs(useidx).mask().Member(ureg_lo)), >>> 686: "invalid assumption"); >> >> Could you use more descriptive names and assertion messages in this new assertion and the one below? Ideally, without having to refer to old versions. What is the invariant that we want to check? How does it relate to the surrounding code? > > As we've previously discussed offline, I also had my doubts when introducing these asserts. I've now had a second look (with reasonably fresh eyes), and believe I now better understand the underlying assumptions. > > The two problematic pieces of code in `postaloc.cpp` from before this changeset that we need to translate as part of the changeset are > > if (!value[ureg_lo] && > (!RegMask::can_represent(ureg_lo) || > lrgs(useidx).mask().Member(ureg_lo))) { // Nearly always adjacent > > and > > if( RegMask::can_represent(nreg_lo) && // Either a spill slot, or > !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent > > Specifically, the `RegMask::can_represent` calls check if their argument registers can fit in the statically determined size of register masks (which no longer makes sense in this changeset). > > The reason for the `can_represent` calls is that the subsequent `Member` calls assert internally that their arguments can fit within the static size of register masks. That is, `can_represent` worked as a guard to ensure the precondition for the call to `Member` holds. In this changeset, the `Member` function is generalized to allow arbitrary arguments (and the interal assert is removed). Therefore, we can remove the `can_represent` guards. > > Now to the assertions that I added (which I've now improved). From the if conditions, we can infer there is an implicit invariant that a register for which `can_represent` returns false is necessarily "adjacent". Specifically, `can_represent` returning false implies that the register is a spill slot (implied by a comment in the source code). However, registers for which `can_represent` returns true may **also** be spill splots, so using `can_represent` as a proxy check for spill slots feels clumsy. I believe that the real invariant here is that only actual registers (and not stack locations, including spill slots) can be non-adjacent. This is what I now verify with my updated asserts. > > For the record, I have not been able to find any cases with non-adjacency in any tests on current Oracle-supported platforms. From another comment in the source code, it looks like non-adjacent pairs are quite specific to SPARC. Good analysis, thanks for investigating Daniel! Maybe worth creating an RFE to investigate whether we can assume (and statically verify) non-adjacent register pairs moving forward, and cleanup this and possibly other C2 back-end code accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028478202 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028473435 From duke at openjdk.org Fri Apr 4 09:54:03 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 09:54:03 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks To me `B5`kind of looks like a backup codepath (see branch at 042). But I cannot see what `R7`is there. One option I see would be to match two `SUB_HF` nodes for RISC-V, since it seems to always generate two of those. The only reason I match on `SUB` in the half float case is that I also do not have `supports_avx512_fp16`. I think I will file an RFE for that separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778113274 From mli at openjdk.org Fri Apr 4 09:54:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:54:03 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Ah, I just checked on riscv, if I disable float16(zfh) it will not generate the extra `SubF` in slow path, i.e. just 2 SubF. So, I guess on x86, it could have the same issue, and the test could fail too if `supports_avx512_fp16` return true. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778117432 From mli at openjdk.org Fri Apr 4 09:54:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:54:03 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 09:46:02 GMT, Manuel H?ssig wrote: > To me B5kind of looks like a backup codepath (see branch at 042). But I cannot see what R7is there. R7 should be tlab_end of the thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778139803 From mli at openjdk.org Fri Apr 4 09:57:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:57:08 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks @jatin-bhateja Could you please help to run the test with `supports_avx512_fp16 ` if you're available? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778148709 From dlunden at openjdk.org Fri Apr 4 11:50:03 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 11:50:03 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: <5XrG6zHnhCQoPomYLQgAAQEoSWNbS0n9dihXE0i_x7g=.cd48e569-704c-4398-9c62-e6cd33c5e417@github.com> On Tue, 1 Apr 2025 16:35:08 GMT, Roberto Casta?eda Lozano wrote: >> test/jdk/java/lang/invoke/BigArityTest.java line 32: >> >>> 30: * (1) have a large number of parameters, and >>> 31: * (2) use JSR292 methods internally (which increases the >>> 32: * MaxNodeLimit with a factor of 3) >> >> Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? > > Same question for the other `java/lang/invoke` test changes. Yes, correct. No longer bailing out on too many arguments results in a lot more compilations (with `-Xcomp`) compared to before in these specific tests, which is why I've had to limit the tests with `MaxNodeLimit`s. That said, I did look into these tests a bit more now after your comment, and there are some peculiar (but artificial) compilations that we no longer bail out on and that we may want to investigate in a future RFE. These compilations each take around 40 seconds (in a release build), are very close to the `MaxNodeLimit` (80 000 nodes), and spend 99% of the time in the register allocator (in the first round of conservative coalescing, specifically). I analyzed these register allocator runs and it looks like we run into the quadratic time complexity of graph-coloring register allocation, because we have a very large number of nodes to begin with and then the interference graph is additionally very dense (contains a very large number of interferences/edges). We already have bailouts related to node count in the register allocator, but no bailouts for the interference graph size. Perhaps we should consider adding this as part of a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028661113 From dlunden at openjdk.org Fri Apr 4 11:53:13 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 11:53:13 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Fri, 4 Apr 2025 09:42:43 GMT, Roberto Casta?eda Lozano wrote: >> As we've previously discussed offline, I also had my doubts when introducing these asserts. I've now had a second look (with reasonably fresh eyes), and believe I now better understand the underlying assumptions. >> >> The two problematic pieces of code in `postaloc.cpp` from before this changeset that we need to translate as part of the changeset are >> >> if (!value[ureg_lo] && >> (!RegMask::can_represent(ureg_lo) || >> lrgs(useidx).mask().Member(ureg_lo))) { // Nearly always adjacent >> >> and >> >> if( RegMask::can_represent(nreg_lo) && // Either a spill slot, or >> !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent >> >> Specifically, the `RegMask::can_represent` calls check if their argument registers can fit in the statically determined size of register masks (which no longer makes sense in this changeset). >> >> The reason for the `can_represent` calls is that the subsequent `Member` calls assert internally that their arguments can fit within the static size of register masks. That is, `can_represent` worked as a guard to ensure the precondition for the call to `Member` holds. In this changeset, the `Member` function is generalized to allow arbitrary arguments (and the interal assert is removed). Therefore, we can remove the `can_represent` guards. >> >> Now to the assertions that I added (which I've now improved). From the if conditions, we can infer there is an implicit invariant that a register for which `can_represent` returns false is necessarily "adjacent". Specifically, `can_represent` returning false implies that the register is a spill slot (implied by a comment in the source code). However, registers for which `can_represent` returns true may **also** be spill splots, so using `can_represent` as a proxy check for spill slots feels clumsy. I believe that the real invariant here is that only actual registers (and not stack locations, including spill slots) can be non-adjacent. This is what I now verify with my updated asserts. >> >> For the record, I have not been able to find any cases with non-adjacency in any tests on current Oracle-supported platforms. From another comment in the source code, it looks like non-adjacent pairs are quite specific to SPARC. > > Good analysis, thanks for investigating Daniel! Maybe worth creating an RFE to investigate whether we can assume (and statically verify) non-adjacent register pairs moving forward, and cleanup this and possibly other C2 back-end code accordingly. I guess you mean assume adjacent register pairs? Sounds good, I'll make a note to create an RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028665257 From rcastanedalo at openjdk.org Fri Apr 4 11:59:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 11:59:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: <5XrG6zHnhCQoPomYLQgAAQEoSWNbS0n9dihXE0i_x7g=.cd48e569-704c-4398-9c62-e6cd33c5e417@github.com> References: <5XrG6zHnhCQoPomYLQgAAQEoSWNbS0n9dihXE0i_x7g=.cd48e569-704c-4398-9c62-e6cd33c5e417@github.com> Message-ID: On Fri, 4 Apr 2025 11:46:56 GMT, Daniel Lund?n wrote: > Perhaps we should consider adding this as part of a separate RFE. This sounds like a good idea, I agree to postpone it to a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028671476 From rcastanedalo at openjdk.org Fri Apr 4 11:59:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 11:59:55 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Fri, 4 Apr 2025 11:50:20 GMT, Daniel Lund?n wrote: >> Good analysis, thanks for investigating Daniel! Maybe worth creating an RFE to investigate whether we can assume (and statically verify) non-adjacent register pairs moving forward, and cleanup this and possibly other C2 back-end code accordingly. > > I guess you mean assume adjacent register pairs? Sounds good, I'll make a note to create an RFE. Right, I meant adjacent pairs, thanks for the clarification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028673416 From dlunden at openjdk.org Fri Apr 4 12:03:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 12:03:28 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Update test comment to also mention timeouts - Fix suboptimal max limit in _grow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/76f6b8f8..c41a76b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=12-13 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From duke at openjdk.org Fri Apr 4 12:16:06 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 12:16:06 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks I ran the test with software emulation of `avx512-fp16` and the test failed the same way as on RISC-V: $ sde64 -gnr -- jtreg [...] test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java [...] Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "public static jdk.incubator.vector.Float16 compiler.floatingpoint.TestSubNodeFloatDoubleNegation.testHalfFloat(jdk.incubator.vector.Float16)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#SUB#_", "2"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(Sub(I|L|F|D|HF).*)+(\s){2}===.*)" - Failed comparison: [found] 3 = 2 [given] - Matched nodes (3): * 326 SubHF === _ 560 325 [[ 327 479 ]] !orig=[478] !jvms: Float16::valueOf @ bci:5 (line 329) Float16::subtract @ bci:9 (line 1137) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 63) * 443 SubF === _ 561 442 [[ 607 ]] !jvms: Float16::subtract @ bci:8 (line 1137) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 61) * 479 SubHF === _ 560 326 [[ 480 ]] !jvms: Float16::valueOf @ bci:5 (line 329) Float16::subtract @ bci:9 (line 1137) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 61) The ideal graph shows the same "alternative codepath" that your opto assembly shows. I guess we need to generally predicate the test on native float16 support. But I can do that in a separate issue, where I also investigate ARM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778516111 From rcastanedalo at openjdk.org Fri Apr 4 12:24:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 12:24:01 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 12:03:28 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update test comment to also mention timeouts > - Fix suboptimal max limit in _grow Looks good, thanks for addressing my comments Daniel! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2742799411 From qamai at openjdk.org Fri Apr 4 12:47:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 4 Apr 2025 12:47:06 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 12:03:28 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update test comment to also mention timeouts > - Fix suboptimal max limit in _grow `TestNestedSynchronize.java` is a massive file. I wonder if you can try to generate it using `MethodHandle` or classfile API instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2778636310 From duke at openjdk.org Fri Apr 4 12:52:14 2025 From: duke at openjdk.org (duke) Date: Fri, 4 Apr 2025 12:52:14 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments @j3graham Your change (at version dda134fbdb1c3b9647c53ef36e5b4a952f9b9576) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2778646344 From dlunden at openjdk.org Fri Apr 4 12:53:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 12:53:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 12:44:39 GMT, Quan Anh Mai wrote: > `TestNestedSynchronize.java` is a massive file. I wonder if you can try to generate it using `MethodHandle` or classfile API instead? Indeed, the plan is to migrate it to the [template-based testing framework](https://github.com/openjdk/jdk/pull/24217) when that is ready. I'll have a look at using `MethodHandle`s, it would be nice to not even pollute the git history with the current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2778647499 From mli at openjdk.org Fri Apr 4 13:07:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 13:07:57 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Seems to me the `SubF` is not necessary to be generated (maybe should and could be removed? ), are you going to investigate that or just modify the test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778682730 From duke at openjdk.org Fri Apr 4 13:21:57 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 13:21:57 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Filed [JDK-8353730](https://bugs.openjdk.org/browse/JDK-8353730). I will first fix the test and then investigate the additional `SubF`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778715245 From mli at openjdk.org Fri Apr 4 13:25:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 13:25:13 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24421/files - new: https://git.openjdk.org/jdk/pull/24421/files/cd9df312..abb3d548 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=00-01 Stats: 10 lines in 2 files changed: 6 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24421/head:pull/24421 PR: https://git.openjdk.org/jdk/pull/24421 From duke at openjdk.org Fri Apr 4 13:27:09 2025 From: duke at openjdk.org (Johannes Graham) Date: Fri, 4 Apr 2025 13:27:09 GMT Subject: Integrated: 8347645: C2: XOR bounded value handling blocks constant folding In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 22:16:20 GMT, Johannes Graham wrote: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. This pull request has now been integrated. Changeset: 37f8e419 Author: Johannes Graham URL: https://git.openjdk.org/jdk/commit/37f8e419f9661ba30b3c34bd9fecef71ab1eddb1 Stats: 620 lines in 5 files changed: 568 ins; 27 del; 25 mod 8347645: C2: XOR bounded value handling blocks constant folding Reviewed-by: epeter, vlivanov, qamai, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/23089 From mli at openjdk.org Fri Apr 4 13:31:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 13:31:59 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: References: Message-ID: <4yKbB3oktdryefE4g492Sg3RaqfONagQlPTiNZsVsPc=.a6013f02-5e69-4f4e-8b3d-9f45fe421168@github.com> On Fri, 4 Apr 2025 13:25:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Great! Let's fix this test first, I'll just fix riscv part, as I don't have the env to verify other platforms. Also file a bug to track this `SubF` (potential) "issue", https://bugs.openjdk.org/browse/JDK-8353732, feel free to take it when you want to start the work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778738383 From duke at openjdk.org Fri Apr 4 13:47:55 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 13:47:55 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: References: Message-ID: <5TGT7p9dT-jE73UeK_mzm_y8ehRlqMDq8FSeYY4ULBI=.54d3b223-a871-4e55-acc0-7159902ddc4a@github.com> On Fri, 4 Apr 2025 13:25:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Looks good to me, with or without my suggestion. Thank you for catching and fixing this! test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 60: > 58: @IR(counts = { IRNode.SUB, "2" }, applyIfPlatform = {"riscv64", "false"}) > 59: @IR(counts = { IRNode.SUB, "2" }, applyIfCPUFeature = {"zfh", "false"}) > 60: @IR(counts = { IRNode.SUB, ">= 2" }, applyIfCPUFeature = {"zfh", "true"}) Just a small nit: I find the following expresses the intention of the test more precisely Suggestion: @IR(counts = { IRNode.SUB_HF, "2" }, applyIfCPUFeature = {"zfh", "true"}) ------------- Marked as reviewed by mhaessig at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24421#pullrequestreview-2743050562 PR Review Comment: https://git.openjdk.org/jdk/pull/24421#discussion_r2028842855 From dlunden at openjdk.org Fri Apr 4 14:11:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 14:11:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Revise overlap comments for frequency of cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/c41a76b9..74357621 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=13-14 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From duke at openjdk.org Fri Apr 4 15:21:13 2025 From: duke at openjdk.org (Johannes Graham) Date: Fri, 4 Apr 2025 15:21:13 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: <8YKb4oCIwuS2i3w6x_1II7HD80XMIfOum2uTLosE0QY=.3cfcdfb9-a693-46c1-88b5-34e96941e6c5@github.com> On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments Thank you all for your help. I?m looking forward to https://github.com/openjdk/jdk/pull/17508 making these kinds of optimizations more systematic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2779038647 From kvn at openjdk.org Fri Apr 4 17:03:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 4 Apr 2025 17:03:48 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 20:20:41 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: > > - Document assumptions about code placement in CodeCache > - Address bulasevich comment: too many parameters values RE-approved ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23831#pullrequestreview-2743647280 From duke at openjdk.org Fri Apr 4 17:24:54 2025 From: duke at openjdk.org (Johannes Graham) Date: Fri, 4 Apr 2025 17:24:54 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 16:17:57 GMT, Hannes Greule wrote: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. src/hotspot/share/opto/subnode.cpp line 2025: > 2023: > 2024: template > 2025: const Type* reverse_bytes(const Node* node, PhaseGVN* phase) { Could this be templated with TypeLong/TypeInt instead of BasicType? There is a `try_cast` that @merykitty added in https://github.com/openjdk/jdk/pull/17508 that might help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2029183060 From vlivanov at openjdk.org Fri Apr 4 19:31:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 4 Apr 2025 19:31:48 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:46:23 GMT, Damon Fenacci wrote: > This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). > > There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: > https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 > While running IGVN this could be misinterpreted as non-MH late-inline > https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 > eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` > > The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: > https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 > > ### Testing > > Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) src/hotspot/share/opto/callGenerator.cpp line 994: > 992: CallGenerator::for_mh_late_inline(caller, callee, input_not_const); > 993: } else { > 994: CallGenerator::for_late_inline(callee, cg); Would `-XX:-IncrementalInlineMH` hit the very same assertion? Should we simply require `IncrementalInlineMH` to delay inlining instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24402#discussion_r2029330281 From vlivanov at openjdk.org Sat Apr 5 02:42:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 5 Apr 2025 02:42:38 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API Message-ID: Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. The patch consists of the following parts: * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) Thanks! ------------- Commit messages: - Misc fixes and cleanups - CPU features support - Cleanup - TODO list - SVML fixes - Update templates - fixes - SLEEF improvements - cleanup - VectorMathLib: Migrate to lambdas - ... and 3 more: https://git.openjdk.org/jdk/compare/9fcb06f9...fc27aee5 Changes: https://git.openjdk.org/jdk/pull/24462/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353786 Stats: 1274 lines in 43 files changed: 825 ins; 393 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From liach at openjdk.org Sat Apr 5 02:42:38 2025 From: liach at openjdk.org (Chen Liang) Date: Sat, 5 Apr 2025 02:42:38 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Moving vector API library selection to Java code looks like a right step to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2779896358 From duke at openjdk.org Sat Apr 5 04:24:54 2025 From: duke at openjdk.org (duke) Date: Sat, 5 Apr 2025 04:24:54 GMT Subject: Withdrawn: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime In-Reply-To: References: Message-ID: On Wed, 25 Dec 2024 14:54:02 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22880 From duke at openjdk.org Sat Apr 5 04:44:56 2025 From: duke at openjdk.org (duke) Date: Sat, 5 Apr 2025 04:44:56 GMT Subject: Withdrawn: 8341293: Split field loads through Nested Phis In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:23:05 GMT, Dhamoder Nalla wrote: > As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. > > > Here are the sequence of Ideal graph transformations for Nested phi: > > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) > > JMH results: > with disabled RAM > > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ? 0.248 ms/op > NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ? 4.306 ms/op > NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ? 1.781 ms/op > NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ? 0.001 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ? 0.731 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ? 0.443 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ? 1.212 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ? 1.478 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ? 0.306 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ? 0.247 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ? 4.869 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ? 48.797 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ? 2.307 ms/op > NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ? 0.308 ms/op > > with enabled RAM > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ? 0.423 ms/op > NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ? 3.477 ms/op > NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ? 1.916 ms/op > NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ? 0.001 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt 15 18.309 ? 0.706 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiProces... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21270 From qamai at openjdk.org Sat Apr 5 06:37:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 5 Apr 2025 06:37:46 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v6] In-Reply-To: References: Message-ID: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into verifycast - draft - Merge branch 'master' into verifycast - Merge branch 'master' into verifycast - better comments - move test to a new file, add block_comment - add tests - make VerifyConstraintCast uint, better debug info - Merge branch 'master' into verifycast - Introduce VerifyConstraintCasts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22880/files - new: https://git.openjdk.org/jdk/pull/22880/files/da854c1f..dbb69375 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=04-05 Stats: 206556 lines in 4871 files changed: 96291 ins; 84056 del; 26209 mod Patch: https://git.openjdk.org/jdk/pull/22880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880 PR: https://git.openjdk.org/jdk/pull/22880 From qamai at openjdk.org Sat Apr 5 06:41:58 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 5 Apr 2025 06:41:58 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v5] In-Reply-To: References: Message-ID: <6rg1PaQa4v0oA3MlJcbr6269PIRamYNjEQ37_q0bpzg=.927af44b-a269-4887-8835-d7405fcf2b0c@github.com> On Fri, 7 Feb 2025 17:02:09 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into verifycast >> - better comments >> - move test to a new file, add block_comment >> - add tests >> - make VerifyConstraintCast uint, better debug info >> - Merge branch 'master' into verifycast >> - Introduce VerifyConstraintCasts > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 840: > >> 838: >> 839: #ifdef ASSERT >> 840: void C2_MacroAssembler::checked_cast_int(const TypeInt* type, Register dst) { > > Naming is a bit confusing here. It is a register which holds the value being range checked, not a register where new value is put. I have renamed it to `val` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2029784445 From qamai at openjdk.org Sat Apr 5 06:46:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 5 Apr 2025 06:46:52 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v5] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 17:11:22 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into verifycast >> - better comments >> - move test to a new file, add block_comment >> - add tests >> - make VerifyConstraintCast uint, better debug info >> - Merge branch 'master' into verifycast >> - Introduce VerifyConstraintCasts > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 852: > >> 850: movl(rcx, type->_lo); >> 851: movl(rdx, type->_hi); >> 852: hlt(); // hlt so we have the stack trace > > That's interesting. Sounds like a problem in `NativeStackPrinter::print_stack()`. > > Speaking of debugging output, a call into a local helper function (encapsulating pretty printing logic) followed by a hlt call will do the job. But, considering the usages are in-line and quite common I suggest to make it conditional (guarded by a flag). It is possible to recover all 3 values from generated code if needed and turn on error reporting (specify the diagnostic flag) when reproducing failures. I have made a helper function that will print the error message together with the parameters, the stack printing is still problematic, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2029786565 From qamai at openjdk.org Sat Apr 5 06:51:34 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 5 Apr 2025 06:51:34 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: make the flag diagnostic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22880/files - new: https://git.openjdk.org/jdk/pull/22880/files/dbb69375..bc8b6af3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=05-06 Stats: 8 lines in 4 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880 PR: https://git.openjdk.org/jdk/pull/22880 From qamai at openjdk.org Sat Apr 5 06:51:38 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 5 Apr 2025 06:51:38 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v5] In-Reply-To: References: Message-ID: <_aCZg6bwh_bfMA0l8-Wdewgnbaz7EI6dk1m1b_MUMeE=.a922648d-c776-439f-8b46-671c83f2103a@github.com> On Fri, 7 Feb 2025 17:10:46 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into verifycast >> - better comments >> - move test to a new file, add block_comment >> - add tests >> - make VerifyConstraintCast uint, better debug info >> - Merge branch 'master' into verifycast >> - Introduce VerifyConstraintCasts > > src/hotspot/share/opto/c2_globals.hpp line 666: > >> 664: "perform extra checks on the results of alias analysis") \ >> 665: \ >> 666: develop(uint, VerifyConstraintCasts, 0, \ > > Any downsides in making the flag diagnostic? It'll make it available in product builds. That's a great idea, done it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2029787586 From hgreule at openjdk.org Sat Apr 5 08:55:48 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 5 Apr 2025 08:55:48 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes In-Reply-To: References: Message-ID: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> On Fri, 4 Apr 2025 17:22:34 GMT, Johannes Graham wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > src/hotspot/share/opto/subnode.cpp line 2025: > >> 2023: >> 2024: template >> 2025: const Type* reverse_bytes(const Node* node, PhaseGVN* phase) { > > Could this be templated with TypeLong/TypeInt instead of BasicType? There is a `try_cast` that @merykitty added in https://github.com/openjdk/jdk/pull/17508 that might help. It wouldn't change much, but yes. Generally this is an example where #17508 shines, as the code could be generalized to just reverse the bytes of the KnownBits structure. Whether this PR waits until then or we refactor the code afterwards isn't that important to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2029815835 From duke at openjdk.org Sat Apr 5 14:29:28 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 5 Apr 2025 14:29:28 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into 8344116 - Fix build - Fix test failed - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/a1924c35..3efb1c17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=04-05 Stats: 28443 lines in 792 files changed: 18710 ins; 7734 del; 1999 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From aturbanov at openjdk.org Sat Apr 5 18:05:55 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Sat, 5 Apr 2025 18:05:55 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 08:15:36 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian test/hotspot/jtreg/compiler/lib/verify/Verify.java line 499: > 497: // Hence, we cannot use the mapping below. We test these boxed primitive types by value anyway, > 498: // and they are no recursive structures, so there is no point in optimizing here anyway. > 499: switch(a) { Suggestion: switch (a) { test/hotspot/jtreg/testlibrary_tests/verify/tests/TestVerify.java line 528: > 526: } > 527: > 528: public static class H1 { Suggestion: public static class H1 { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2029936364 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2029936226 From cushon at openjdk.org Sat Apr 5 18:51:35 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Sat, 5 Apr 2025 18:51:35 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v8] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Add -XX:+UnlockDiagnosticVMOptions - Check type before uncasting A child phi node may transition from con to non-con, making the AND node transition back from "0" to its current type. If that current type is still TOP we're in violation of monotonicity. Therefore, don't apply optimization if AND is not integer yet. - Merge commit '9bb804b14e1' into JDK-8350563 - Explicitly check for OP_Con instead of TypeInteger::is_con. 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. - Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java Co-authored-by: Christian Hagedorn - Merge remote-tracking branch 'origin/master' into JDK-8350563 - Reformat test and update package to ccp - Review comments - Update test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java Co-authored-by: Emanuel Peter - copyright - ... and 5 more: https://git.openjdk.org/jdk/compare/e3f535dd...23119e18 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/b064c47b..23119e18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=06-07 Stats: 100210 lines in 2709 files changed: 41543 ins; 47635 del; 11032 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From cushon at openjdk.org Sat Apr 5 18:51:35 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Sat, 5 Apr 2025 18:51:35 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v7] In-Reply-To: References: Message-ID: <6R0v3e8YigF1hhHnYiTjTWjNRN0ZlZ9LuXnqQ1jGnb4=.76d5b308-edb4-4190-aab5-4aee4a81f8ee@github.com> On Mon, 31 Mar 2025 16:08:57 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly check for OP_Con instead of TypeInteger::is_con. > > 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) > > While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. >From Matthias --- Thank you Christian for the excellent explanation. I understand how the phi going from con to non-con makes the AND node go from 0 back to top. I've pushed a commit with the fix and the reproducer; verified the latter only passes w/ the fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2781039859 From cslucas at openjdk.org Sun Apr 6 04:09:49 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Sun, 6 Apr 2025 04:09:49 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" Message-ID: The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. I tested this with JTREG Tier 1-3 on Linux x86_64. ------------- Commit messages: - Patch step_through_mergemem to not re-register mergemems. Changes: https://git.openjdk.org/jdk/pull/24471/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24471&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352681 Stats: 72 lines in 2 files changed: 68 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24471/head:pull/24471 PR: https://git.openjdk.org/jdk/pull/24471 From duke at openjdk.org Sun Apr 6 06:09:05 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 6 Apr 2025 06:09:05 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 14:29:28 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make Hi @TobiHartmann , Could you please take a look? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24258#issuecomment-2781240184 From dhanalla at openjdk.org Sun Apr 6 07:50:48 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Sun, 6 Apr 2025 07:50:48 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v8] In-Reply-To: References: Message-ID: > As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. > > > Here are the sequence of Ideal graph transformations for Nested phi: > > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) > > JMH results: > with disabled RAM > > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ? 0.248 ms/op > NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ? 4.306 ms/op > NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ? 1.781 ms/op > NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ? 0.001 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ? 0.731 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ? 0.443 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ? 1.212 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ? 1.478 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ? 0.306 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ? 0.247 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ? 4.869 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ? 48.797 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ? 2.307 ms/op > NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ? 0.308 ms/op > > with enabled RAM > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ? 0.423 ms/op > NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ? 3.477 ms/op > NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ? 1.916 ms/op > NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ? 0.001 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt 15 18.309 ? 0.706 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiProces... Dhamoder Nalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - fix an assert - Merge branch 'openjdk:master' into EASR/nestedphi - Modify IR rules - update the copyright years to 2025 - update the copyright years to 2025 - add IR rule applyIf - update bug id in test - CR feedback - Fix trailing whitespaces - Split load fields through nestead phi nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21270/files - new: https://git.openjdk.org/jdk/pull/21270/files/9d97d534..3c56f98d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=06-07 Stats: 1210907 lines in 16361 files changed: 717984 ins; 389919 del; 103004 mod Patch: https://git.openjdk.org/jdk/pull/21270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21270/head:pull/21270 PR: https://git.openjdk.org/jdk/pull/21270 From duke at openjdk.org Sun Apr 6 13:42:48 2025 From: duke at openjdk.org (Johannes Graham) Date: Sun, 6 Apr 2025 13:42:48 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes In-Reply-To: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Sat, 5 Apr 2025 08:52:49 GMT, Hannes Greule wrote: >> src/hotspot/share/opto/subnode.cpp line 2025: >> >>> 2023: >>> 2024: template >>> 2025: const Type* reverse_bytes(const Node* node, PhaseGVN* phase) { >> >> Could this be templated with TypeLong/TypeInt instead of BasicType? There is a `try_cast` that @merykitty added in https://github.com/openjdk/jdk/pull/17508 that might help. > > It wouldn't change much, but yes. Generally this is an example where #17508 shines, as the code could be generalized to just reverse the bytes of the KnownBits structure. Whether this PR waits until then or we refactor the code afterwards isn't that important to me. I didn?t intend to advocate for waiting for that PR. `try_cast` is tiny and could be added independently. I have just been looking for things that could generally help with unifying TypeInt/TypeLong implementations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2030155751 From eastigeevich at openjdk.org Sun Apr 6 17:40:56 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sun, 6 Apr 2025 17:40:56 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 17:01:27 GMT, Vladimir Kozlov wrote: > RE-approved Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2781526530 From eastigeevich at openjdk.org Sun Apr 6 17:40:57 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sun, 6 Apr 2025 17:40:57 GMT Subject: Integrated: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... This pull request has now been integrated. Changeset: 660b17a6 Author: Evgeny Astigeevich URL: https://git.openjdk.org/jdk/commit/660b17a6b9afe26dee2d9647755c75d817888eda Stats: 368 lines in 1 file changed: 368 ins; 0 del; 0 mod 8350852: Implement JMH benchmark for sparse CodeCache Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/23831 From duke at openjdk.org Mon Apr 7 02:24:03 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 7 Apr 2025 02:24:03 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Remove unused code - Move code to addnode.cpp and add more tests - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Fix test - Add more tests - Enable StressIGVN and riscv platform - Change tests as review comments - Fix test failure and change for review comments - Revert extract value and add more tests - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 ------------- Changes: https://git.openjdk.org/jdk/pull/24023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=11 Stats: 2507 lines in 17 files changed: 2458 ins; 0 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From fyang at openjdk.org Mon Apr 7 02:36:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 7 Apr 2025 02:36:48 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v3] In-Reply-To: References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: <9tRHq6IVCT4LYnKp_pAkVTBXBPsc1YWL8k1ooMzwPkc=.8fcff0f9-e581-4ee5-a8f6-e547bf11f293@github.com> On Thu, 3 Apr 2025 13:49:05 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. >> >> int shift = 2; >> byte b = 83; >> byte res = (byte) (b << shift | b >> -shift); // res = 76 >> // but a real left rotate of 83 should be 77 ?? >> ``` >> >> So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. >> >> A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. >> >> The vector instruction behaviour is different from java language spec, so seems there is no way to do it for now. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comment LGTM. Thanks for fixing this! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24414#pullrequestreview-2745410453 From mchevalier at openjdk.org Mon Apr 7 05:24:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 7 Apr 2025 05:24:57 GMT Subject: Integrated: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <0131FJuGwDAfwqB3GKnlj_9xeoinsnsUjNq8LodfkZE=.9501af2c-473f-4387-b319-aac1dff8cd18@github.com> On Wed, 5 Mar 2025 12:56:48 GMT, Marc Chevalier wrote: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... This pull request has now been integrated. Changeset: 97ed5361 Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/97ed536125645304aed03a4afbc3ded627de0bb0 Stats: 845 lines in 6 files changed: 769 ins; 59 del; 17 mod 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow Reviewed-by: thartmann, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/23916 From thartmann at openjdk.org Mon Apr 7 06:02:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Apr 2025 06:02:52 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 06:49:50 GMT, Marc Chevalier wrote: >> If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. >> >> In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. >> >> This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. >> >> The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. >> >> Tested with tier1..3, hs-precheckin-comp and hs-comp-stress >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > not reinventing the wheel This looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24245#pullrequestreview-2745596627 From epeter at openjdk.org Mon Apr 7 06:08:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 7 Apr 2025 06:08:55 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v13] In-Reply-To: References: Message-ID: <9lzS-yIyl2z6YFwV9VpAyaYcnLujK1gwamhBVSmngeA=.b209089f-69eb-4ea0-8171-c25fbc1f28fe@github.com> > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/b8fad69c..49f6789c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=11-12 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Mon Apr 7 06:08:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 7 Apr 2025 06:08:56 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: <4G5Po8SEYFxSylfIJtndUpu0LLboJPgGgmE8FL3t1S4=.39c5d519-7a78-43c7-a1fe-8cef72901490@github.com> On Thu, 3 Apr 2025 08:15:36 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian @turbanoff Thanks for the two whitespace fixes :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2782111679 From pminborg at openjdk.org Mon Apr 7 06:47:11 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 7 Apr 2025 06:47:11 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: <9BCE8xN6SA-cPEc1EtuSsqoYwsHiwp31lJKsraWgYso=.67a97434-ef3c-40ab-b5be-841889fdd97c@github.com> On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 258: > 256: if (LIBRARY.isSupported(op, vspecies)) { > 257: String symbol = LIBRARY.symbolName(op, vspecies); > 258: MemorySegment addr = LOOKUP.find(symbol) It is better to use `LOOKUP.findOrThrow()` because it does not require lambda creation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2030551872 From chagedorn at openjdk.org Mon Apr 7 06:49:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 06:49:07 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 07:25:59 GMT, Roland Westrelin wrote: >>> I think it's been considered good practice over the year to be particularly defensive about this >> >> Makes sense from a stability point of view. I'm wondering though if it's not a bug when the cast input is null at this point. Aren't there only few CFG nodes, like regions, where we set some inputs to null already? There is other code, for example in `ConvI2L::Ideal()`, that later accesses `in(1)` without null check: >> >> https://github.com/openjdk/jdk/blob/1ec2177a6b25573732b902f76bb81dd1cdaf7edf/src/hotspot/share/opto/convertnode.cpp#L728 >> >> To be consistent, we would also need to add a check for the other accesses in the method or turn the null check into a bailout for the entire `Ideal()` method. If we agree that null is unexpected (or assume it should be), we might also want to add asserts accordingly. >> >> My concern is that most IGVN methods assume non-control inputs cannot be null where we normally expect a sane input. This is probably true but hard to prove. To be overally consistent, we should also consider adding bailout and assertion code there. While it's the safest solution, this could introduce a lot of new code, especially for multi input nodes, which also makes it harder to read. What are your thought about that? >> >> Anyway, we don't need to make a decision as part of this PR on how we should generally handle inputs in IGVN method. It's fine if we only concentrate on the touched/new code here. > > I agree that we would need to be consistent and that it makes little sense to add null checks in code that have been around forever and has never caused issues. Maybe we can somehow have igvn itself assert that every node it processes has a set of expected inputs non null? I suppose, every node type would need to define which of its inputs can be null, then. Yes, that would be a nice verification. Should we file an RFE for that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2030552612 From chagedorn at openjdk.org Mon Apr 7 06:49:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 06:49:08 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> Message-ID: On Thu, 3 Apr 2025 12:38:49 GMT, Roland Westrelin wrote: >> I assume that JDK-8275202 also calls this method with a non-null `PhaseIdealLoop` pointer? Now we only pass in null, so the `loop` parameter could be removed. > > Right. Do you think it's better to remove the parameter that's used (for now)? Up to you, I'm fine to leave it in if you estimate that JDK-8275202 is coming in soon anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2030553794 From chagedorn at openjdk.org Mon Apr 7 06:54:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 06:54:55 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 14:09:58 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/phaseX.cpp line 1836: >> >>> 1834: _type_nodes.push(n); >>> 1835: } >>> 1836: const Type* new_type = n->Value(this); >> >> Could we also only add `n` to `_type_nodes` if `new_type` is top? Then we could also rename `_type_nodes` to `_maybe_top_type_nodes` or something like that. > > if `new_type` is top? > As node's types are widen by CCP, a node `n` will initially be `top`, then one input changes and becomes not `top` but if the node has another input (say control), that other input will still be `top` so the type will be `top` again. Only once both inputs are not `top` is the type not `top`. So isn't there a good chance that most type nodes will initially be `top` and be enqueued anyway so filtering nodes when they are popped is still required and we don't gain much by doing what you suggest? That's true, for most nodes it probably does not matter. But I'm thinking about `Phi` nodes which could already be updated when one path is non-top. So, it might still be worth to do it after `Value()` only? IIUC, it does not matter from a correctness point of view if enqueue before or after `Value()` - we would still filter later for `top` either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2030562198 From chagedorn at openjdk.org Mon Apr 7 07:07:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 07:07:05 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v8] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 18:51:35 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Add -XX:+UnlockDiagnosticVMOptions > - Check type before uncasting > > A child phi node may transition from con to non-con, making the AND node transition back from "0" to its current type. If that current type is still TOP we're in violation of monotonicity. Therefore, don't apply optimization if AND is not integer yet. > - Merge commit '9bb804b14e1' into JDK-8350563 > - Explicitly check for OP_Con instead of TypeInteger::is_con. > > 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) > > While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. > - Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn > - Merge remote-tracking branch 'origin/master' into JDK-8350563 > - Reformat test and update package to ccp > - Review comments > - Update test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java > > Co-authored-by: Emanuel Peter > - copyright > - ... and 5 more: https://git.openjdk.org/jdk/compare/c9e08477...23119e18 Sure, you're welcome! The fix looks good to me now. I will emit some internal testing again when you merged the tests together. Then I think it's good to go :-) test/hotspot/jtreg/compiler/ccp/TestAndConZeroMonotonic.java line 35: > 33: public class TestAndConZeroMonotonic { > 34: > 35: public static void main(String[] args) { Since it's quite an easy test, I suggest to merge the two test files together by calling `Integer::parseInt()` directly instead. You can just add another `@run` statement. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23871#pullrequestreview-2745711197 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2030575931 From chagedorn at openjdk.org Mon Apr 7 07:11:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 07:11:51 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" In-Reply-To: References: Message-ID: On Sun, 6 Apr 2025 04:04:56 GMT, Cesar Soares Lucas wrote: > I didn't opt to make _delay_transform false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. > > Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the if (param != param) {...} and I'm going to investigate that as a separate RFE. Sounds good! The fix looks good to me. src/hotspot/share/opto/memnode.cpp line 293: > 291: toop->offset() == Type::OffsetBot)) { > 292: // IGVN _delay_transform may be set to true and if that is the case and mmem > 293: // is already a registered then the validation inside transform will Suggestion: // is already a registered node then the validation inside transform will ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24471#pullrequestreview-2745721433 PR Review Comment: https://git.openjdk.org/jdk/pull/24471#discussion_r2030581836 From epeter at openjdk.org Mon Apr 7 07:16:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 7 Apr 2025 07:16:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: References: Message-ID: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> On Fri, 4 Apr 2025 14:11:56 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Revise overlap comments for frequency of cases A few quick first comments :) src/hotspot/share/adlc/formsopt.cpp line 180: > 178: // in the register mask regardless of how much slack is created by rounding. > 179: // This was found necessary after adding 16 new registers for APX. > 180: return (words_for_regs + 3 + 1 + 1) & ~1; Is the comment above still accurate? Specifically this: // on the stack (stack registers) up to some interesting limit. Methods // that need more parameters will NOT be compiled. On Intel, the limit // is something like 90+ parameters. And if not: might there be other comments around like this? src/hotspot/share/opto/regmask.hpp line 40: > 38: // stack slots used by BoxLockNodes. We reach this limit by, e.g., deeply > 39: // nesting synchronized statements in Java. > 40: const int BoxLockNode_slot_limit = 200; Where does this number come from? I've added arbitrary constants like this, and sometimes it is hard to give a good justification. But at least writing down what was your thinking might help someone else if they come across it later. Do you have a sense how large it should be at least or at most? src/hotspot/share/opto/regmask.hpp line 86: > 84: (((RM_SIZE_MIN << 5) + // Slots for machine registers > 85: (max_method_parameter_length * 2) + // Slots for incoming arguments > 86: (max_method_parameter_length * 2) + // Slots for outgoing arguments Why `*2`? Is that for 64 bit arguments that are split into two 32 bit words? src/hotspot/share/opto/regmask.hpp line 104: > 102: // the machine registers and usually all parameters that need to be passed > 103: // on the stack (stack registers) up to some interesting limit. On Intel, > 104: // the limit is something like 90+ parameters. Here you fixed the comment, so probably the other one needs to be fixed too ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2745689438 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030561927 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030577639 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030584435 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030585708 From shade at openjdk.org Mon Apr 7 07:42:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 07:42:06 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v3] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Wed, 2 Apr 2025 08:56:20 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor whitespace reverts Thanks! I have spot-checked that the merge with master yields no surprises. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24301#issuecomment-2782309121 From shade at openjdk.org Mon Apr 7 07:42:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 07:42:07 GMT Subject: Integrated: 8353188: C1: Clean up x86 backend after 32-bit x86 removal In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Fri, 28 Mar 2025 17:11:14 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` This pull request has now been integrated. Changeset: d63b561f Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/d63b561fffd42d76f14771c47951dd1d08efe3a7 Stats: 1081 lines in 11 files changed: 1 ins; 1035 del; 45 mod 8353188: C1: Clean up x86 backend after 32-bit x86 removal Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/24301 From duke at openjdk.org Mon Apr 7 07:54:29 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 7 Apr 2025 07:54:29 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected Message-ID: # Issue Summary When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. # Changes The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, - add some missing parse predicate nodes to the IR-framework, - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), - rework the regex for detecting parse predicates in the IR-framework, - add a test to ensure parse predicates are cloned into unswitched loops. # Testing - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) - tier1 through tier3 plus Oracle internal testing ------------- Commit messages: - Add IR test for predicate cloning - ir-framework: make the parse predicate node regex more robust - ir-framework: add auto vectorization check node - ir-framework: add opaque template assertion predicate node Changes: https://git.openjdk.org/jdk/pull/24479/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346552 Stats: 131 lines in 3 files changed: 126 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From rcastanedalo at openjdk.org Mon Apr 7 08:10:58 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Apr 2025 08:10:58 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:44:12 GMT, Roberto Casta?eda Lozano wrote: >> Just a minor aesthetic thing: I noticed that in phases with no liveness information, the liveness information in each node is replaced by an empty space (instead of nothing): >> image >> instead of >> image >> >> >> Otherwise it looks good to me (I probably made more of a functionality rather than a code style/semantics kind of review). Thanks again @robcasloz. > >> Just a minor aesthetic thing: I noticed that in phases with no liveness information, the liveness information in each node is replaced by an empty space (instead of nothing) > > Good catch, thanks Damon! Commit efbde14a should address that, please check that it works as you expect. > @robcasloz This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23558#issuecomment-2782386813 From roland at openjdk.org Mon Apr 7 08:27:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Apr 2025 08:27:04 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v4] In-Reply-To: References: Message-ID: > The test crashes because of a division by zero. The `Div` node for > that one is initially part of a counted loop. The control input of the > node is cleared because the divisor is non zero. This is because the > divisor depends on the loop phi and the type of the loop phi is > narrowed down when the counted loop is created. pre/main/post loops > are created, unrolling happens, the main loop looses its backedge. The > `Div` node can then float above the zero trip guard for the main > loop. When the zero trip guard is not taken, there's no guarantee the > divisor is non zero so the `Div` node should be pinned below it. > > I propose we revert the change I made with 8334724 which removed > `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this > method inserted was there to handle exactly this problem. It was added > initially for a similar issue but with array loads. That problem with > loads is handled some other way now and that's why I thought it was > safe to proceed with the removal. > > The code in this patch is somewhat different from the one we had > before for a couple reasons: > > 1- assert predicate code evolved and so previous logic can't be > resurrected as it was. > > 2- the previous logic has a bug. > > Regarding 1-: during pre/main/post loop creation, we used to add the > `CastII` and then to add assertion predicates (so assertion predicates > depended on the `CastII`). Then when unrolling, when assertion > predicates are updated, we would skip over the `CastII`. What I > propose here is to add the `CastII` after assertion predicates are > added. As a result, they don't depend on the `CastII` and there's no > need for any extra logic when unrolling happens. This, however, > doesn't work when the assertion predicates are added by RCE. In that > case, I had to add logic to skip over the `CastII` (similar to what > existed before I removed it). > > Regarding 2-: previous implementation for > `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at > the first loop `Phi` it encounters that's a use of the loop increment: > it's usually the iv but not always. I tweaked the test case to show, > this bug can actually cause a crash and changed the logic for > `PhaseIdealLoop::cast_incr_before_loop()` accordingly. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - merge - Merge branch 'master' into JDK-8349139 - other test + review comment - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/23617/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23617&range=03 Stats: 199 lines in 7 files changed: 167 ins; 25 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23617/head:pull/23617 PR: https://git.openjdk.org/jdk/pull/23617 From chagedorn at openjdk.org Mon Apr 7 08:30:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 08:30:57 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 18:20:15 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: > > ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). Looks good! How can it be visualized inside the node? Have you adapted some filter? AFAIU, you only dump an additional node property which is shown in the IGV node properties view. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24422#pullrequestreview-2745955268 From dfenacci at openjdk.org Mon Apr 7 08:31:09 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 7 Apr 2025 08:31:09 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: <8E2H46eKkBbimhaPlfDf1qyYsB-bLC7Y5JsZmGCx9rU=.7db1d0ec-b452-4643-a9b5-58ae608de595@github.com> On Wed, 2 Apr 2025 14:48:20 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. >> >> Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge > - Improve AbsNode::Value test/hotspot/jtreg/compiler/c2/irTests/TestIRAbs.java line 249: > 247: @DontCompile > 248: public void checkIntRange(int i) { > 249: Asserts.assertEquals(Math.abs((i & 7) - 4) > 4, testIntRange1(i)); Cool improvement @jaskarth! It might not be directly related to your optimization and marginally relevant but I was wondering if it would make sense to widen the choice of constants a bit (maybe adding few more or some randomly generated ones)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2030720268 From shade at openjdk.org Mon Apr 7 08:31:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 08:31:49 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: <8UOZXyMUssuNga9jUwBf6F1Nmhi6a3ZIJGpXzS3KL3U=.50774170-672e-49c9-8527-077914838e94@github.com> References: <8UOZXyMUssuNga9jUwBf6F1Nmhi6a3ZIJGpXzS3KL3U=.50774170-672e-49c9-8527-077914838e94@github.com> Message-ID: On Thu, 3 Apr 2025 17:06:16 GMT, Vladimir Kozlov wrote: >> Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` > > src/hotspot/cpu/x86/x86.ad line 2680: > >> 2678: break; >> 2679: case Op_VecX: >> 2680: #ifndef _LP64 > > Here and in following code you left code for 32-bit instead of 64-bits. Oh (facepalms). I guess the lesson here is not touch .ad very late at night :) Reverted to hopefully proper form. I looked through the patch again, and I think these 4 adjacent hunks are the only place I made this mistake. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24300#discussion_r2030720354 From rcastanedalo at openjdk.org Mon Apr 7 08:43:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Apr 2025 08:43:54 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:28:30 GMT, Christian Hagedorn wrote: > Looks good! Thanks for reviewing, Christian! > How can it be visualized inside the node? Have you adapted some filter? AFAIU, you only dump an additional node property which is shown in the IGV node properties view. For the above example, I just set the `Node Text` setting in IGV's Options window to [idx] [name] [oopmap] ------------- PR Comment: https://git.openjdk.org/jdk/pull/24422#issuecomment-2782477043 From shade at openjdk.org Mon Apr 7 08:47:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 08:47:19 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: > Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Cleanup ADLC as well - Revert some accidental removals - Merge branch 'master' into JDK-8353192-x86-c2-backend - Touchup - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24300/files - new: https://git.openjdk.org/jdk/pull/24300/files/6edebc7b..5d0c852e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24300&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24300&range=00-01 Stats: 29291 lines in 725 files changed: 19768 ins; 7520 del; 2003 mod Patch: https://git.openjdk.org/jdk/pull/24300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24300/head:pull/24300 PR: https://git.openjdk.org/jdk/pull/24300 From amitkumar at openjdk.org Mon Apr 7 08:49:01 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 7 Apr 2025 08:49:01 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory Message-ID: Unsafe::setMemory intrinsic implementation for s390x. Will update benchmark: ------------- Commit messages: - s390: unsafe::setMemory Port Changes: https://git.openjdk.org/jdk/pull/24480/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353500 Stats: 132 lines in 2 files changed: 128 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480 PR: https://git.openjdk.org/jdk/pull/24480 From shade at openjdk.org Mon Apr 7 08:50:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 08:50:53 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 17:25:25 GMT, Vladimir Kozlov wrote: > I looked on `adlc` code to make sure nothing left there and found check for `IA32`. Right, ADLC should also be cleaned up a bit in this PR, purged a bit more. > I surprise to see `X32` too which we check in `os_linux.cpp`. Right. We are going to handle all these remaining leftovers with https://bugs.openjdk.org/browse/JDK-8351149, after we yank the major solid parts like this compiler backend support :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24300#issuecomment-2782496973 From dfenacci at openjdk.org Mon Apr 7 09:11:44 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 7 Apr 2025 09:11:44 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: > This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). > > There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: > https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 > While running IGVN this could be misinterpreted as non-MH late-inline > https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 > eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` > > The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: > https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 > > ### Testing > > Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8652963: review fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24402/files - new: https://git.openjdk.org/jdk/pull/24402/files/c282fb9a..3fa592b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24402&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24402&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24402/head:pull/24402 PR: https://git.openjdk.org/jdk/pull/24402 From dfenacci at openjdk.org Mon Apr 7 09:13:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 7 Apr 2025 09:13:53 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 19:29:06 GMT, Vladimir Ivanov wrote: > Would -XX:-IncrementalInlineMH hit the very same assertion? Indeed! ???? > Should we simply require IncrementalInlineMH to delay inlining instead? Yes. I removed the non-MH late inline call as well. Thanks @iwanowww! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24402#discussion_r2030810099 From chagedorn at openjdk.org Mon Apr 7 09:30:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 09:30:49 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 18:20:15 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: > > ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). I see, thanks! So, I guess it's not something we should have as a filter like showing type info? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24422#issuecomment-2782684330 From rcastanedalo at openjdk.org Mon Apr 7 10:07:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Apr 2025 10:07:50 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 09:28:17 GMT, Christian Hagedorn wrote: > I see, thanks! So, I guess it's not something we should have as a filter like showing type info? Right, I would advocate for not creating a filter for it yet, to avoid overloading the filter list. If we find in the future that the property is generally useful to display in the node labels, we might consider adding a filter then (or possibly extending the existing `Show custom node info` filter). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24422#issuecomment-2782788530 From enikitin at openjdk.org Mon Apr 7 11:29:30 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Mon, 7 Apr 2025 11:29:30 GMT Subject: RFR: 8353841: [jittester] Fix JITTester build after asm removal Message-ID: [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. Testing: Local runs of targets `COMPILE` and `all`, no errors found. ------------- Commit messages: - 8353841: [jittester] Fix JITTester build after asm removal Changes: https://git.openjdk.org/jdk/pull/24487/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24487&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353841 Stats: 9 lines in 1 file changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24487/head:pull/24487 PR: https://git.openjdk.org/jdk/pull/24487 From chagedorn at openjdk.org Mon Apr 7 11:29:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 11:29:50 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 18:20:15 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: > > ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). Sounds good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24422#issuecomment-2782994965 From dlunden at openjdk.org Mon Apr 7 11:42:07 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 7 Apr 2025 11:42:07 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Thu, 3 Apr 2025 11:43:32 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/regmask.hpp line 545: >> >>> 543: >>> 544: // Overlap test. Non-zero if any registers in common, including all-stack. >>> 545: bool overlap(const RegMask &rm) const { >> >> Please review the frequency of the different tests in this function. I ran an instrumented version and found the test in Case 4 to succeed (return true) more often that Case 2 and Case 3. > > Thanks, I made a note to run some benchmarks for this and gather statistics. It is critical that we run case 1 first (results in a significant performance gain), but perhaps we can gain a little by ordering the rare cases as well. I ran Dacapo and only ever triggered case 1 (never the rare cases). As we discussed offline, you likely triggered the other cases when artificially reducing the static register mask size to stress the changeset? I don't think the ordering of the rare cases matter much in practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2031056335 From amitkumar at openjdk.org Mon Apr 7 11:55:48 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 7 Apr 2025 11:55:48 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:44:07 GMT, Amit Kumar wrote: > Unsafe::setMemory intrinsic implementation for s390x. > > Will update benchmark: with patch: with the patch: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 2.351 ? 0.015 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 2.655 ? 0.020 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 2.614 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 2.783 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 2.760 ? 0.014 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 2.891 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 2.697 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 2.769 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 3.689 ? 0.016 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 3.127 ? 0.009 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 15.900 ? 0.046 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 4.140 ? 0.057 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 53.748 ? 0.872 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 9.245 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 2.346 ? 0.020 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 2.647 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 2.617 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 2.786 ? 0.008 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 2.755 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 2.892 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 2.699 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 2.765 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 3.691 ? 0.015 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 3.175 ? 0.053 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 15.892 ? 0.028 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 15.122 ? 0.347 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 53.588 ? 0.315 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 52.775 ? 0.169 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 2.333 ? 0.216 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 1.878 ? 0.092 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 2.301 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 2.400 ? 0.201 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 2.666 ? 0.052 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 2.209 ? 0.084 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 3.086 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 2.294 ? 0.217 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 4.631 ? 0.013 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 2.164 ? 0.124 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 13.959 ? 0.042 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 3.078 ? 0.211 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 51.435 ? 0.712 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 7.879 ? 0.140 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 2.486 ? 0.169 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 2.163 ? 0.065 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 2.307 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 2.489 ? 0.121 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 2.653 ? 0.025 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 2.830 ? 0.161 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 3.086 ? 0.008 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 3.124 ? 0.189 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 4.634 ? 0.015 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 4.552 ? 0.194 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 13.977 ? 0.031 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 14.310 ? 0.177 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 52.244 ? 1.414 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 53.824 ? 0.580 ns/op Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe' without patch: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 2.368 ? 0.029 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 2.647 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 2.615 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 2.782 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 2.760 ? 0.014 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 2.889 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 2.702 ? 0.017 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 2.766 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 3.748 ? 0.045 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 3.122 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 24.901 ? 0.106 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 20.841 ? 0.154 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 24.498 ? 0.233 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 24.290 ? 0.050 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 2.345 ? 0.012 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 2.648 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 2.619 ? 0.008 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 2.784 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 2.756 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 2.892 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 2.702 ? 0.011 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 2.765 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 3.702 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 3.121 ? 0.010 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 25.130 ? 0.058 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 24.891 ? 0.128 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 24.385 ? 0.061 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 24.444 ? 0.076 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 19.611 ? 0.495 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 18.797 ? 0.126 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 22.808 ? 0.075 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 18.797 ? 0.047 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 22.934 ? 0.114 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 19.580 ? 0.061 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 22.798 ? 0.063 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 18.029 ? 0.689 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 22.736 ? 0.034 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 17.799 ? 0.276 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 22.777 ? 0.033 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 19.271 ? 0.017 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 22.758 ? 0.068 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 22.752 ? 0.057 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 19.115 ? 0.069 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 22.795 ? 0.067 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 22.754 ? 0.057 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 22.797 ? 0.064 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 22.803 ? 0.078 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 22.738 ? 0.044 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 22.815 ? 0.074 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 22.732 ? 0.026 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 22.754 ? 0.063 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 22.743 ? 0.042 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 23.250 ? 1.193 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 22.838 ? 0.182 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 22.748 ? 0.033 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 22.740 ? 0.039 ns/op Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe' ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2783067395 From rcastanedalo at openjdk.org Mon Apr 7 11:59:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Apr 2025 11:59:52 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:25:48 GMT, Daniel Lund?n wrote: > After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. > > ### Changeset > > Changes: > - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. > - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. > - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing > - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. > - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. > > ### Additional issue investigation > > For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer ... src/hotspot/share/opto/cfgnode.cpp line 2580: > 2578: // IGVN iteration. We have put the Phi nodes on the IGVN worklist, so > 2579: // they are transformed later on in any case. > 2580: hook->destruct(igvn); Do you need this `hook` node to preserve `new_base` now that you are not calling `PhaseGVN::transform` anymore? src/hotspot/share/opto/phaseX.cpp line 1058: > 1056: // Ensure we did not increase the live node count with more than > 1057: // max_live_nodes_increase_per_iteration during the call to transform_old > 1058: DEBUG_ONLY(int increase = live_nodes_after - live_nodes_before;) For consistency with the surrounding code, maybe you could define these as `NOT_PRODUCT`, and possibly group them under `#ifndef PRODUCT` blocks? test/hotspot/jtreg/compiler/itergvn/TestSplitPhiThroughMergeMem.java line 67: > 65: new String("abcdef" + param2); > 66: new String("ghijklmn" + param1); > 67: new String("ghijklmn" + param1); This test illustrates an interesting behavior: C2 generates around 12 Kb of code for this rather infrequent code path (and the frequency can be further reduced without affecting C2's outcome). This seems to contradict C2's general philosophy of focusing the compilation effort (and code cache usage) on hot code. It would be interesting to investigate whether there is an opportunity to make some heuristic more execution-frequency aware here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031060193 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031064494 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031084663 From mli at openjdk.org Mon Apr 7 12:05:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Apr 2025 12:05:11 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24421/files - new: https://git.openjdk.org/jdk/pull/24421/files/abb3d548..9ed51e29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24421/head:pull/24421 PR: https://git.openjdk.org/jdk/pull/24421 From mli at openjdk.org Mon Apr 7 12:05:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Apr 2025 12:05:12 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: <5TGT7p9dT-jE73UeK_mzm_y8ehRlqMDq8FSeYY4ULBI=.54d3b223-a871-4e55-acc0-7159902ddc4a@github.com> References: <5TGT7p9dT-jE73UeK_mzm_y8ehRlqMDq8FSeYY4ULBI=.54d3b223-a871-4e55-acc0-7159902ddc4a@github.com> Message-ID: On Fri, 4 Apr 2025 13:43:43 GMT, Manuel H?ssig wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine > > test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 60: > >> 58: @IR(counts = { IRNode.SUB, "2" }, applyIfPlatform = {"riscv64", "false"}) >> 59: @IR(counts = { IRNode.SUB, "2" }, applyIfCPUFeature = {"zfh", "false"}) >> 60: @IR(counts = { IRNode.SUB, ">= 2" }, applyIfCPUFeature = {"zfh", "true"}) > > Just a small nit: I find the following expresses the intention of the test more precisely > > Suggestion: > > @IR(counts = { IRNode.SUB_HF, "2" }, applyIfCPUFeature = {"zfh", "true"}) Make sense, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24421#discussion_r2031094485 From rcastanedalo at openjdk.org Mon Apr 7 12:05:48 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Apr 2025 12:05:48 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:25:48 GMT, Daniel Lund?n wrote: > This motivates that the dramatic live node increase seen in this issue is an edge case and is not a problem in general. Thanks for the elaborate analysis, Daniel! I agree with your conclusion from the results you show, but if possible it would be good, for completeness, to study how many (if any) of those bailouts in Renaissance, SPECjvm, and SPECjbb are due to excessive IGVN node counts, at least for the "baseline" and "target" configurations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2783101024 From rcastanedalo at openjdk.org Mon Apr 7 12:09:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Apr 2025 12:09:01 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 7 Apr 2025 11:39:36 GMT, Daniel Lund?n wrote: >> Thanks, I made a note to run some benchmarks for this and gather statistics. It is critical that we run case 1 first (results in a significant performance gain), but perhaps we can gain a little by ordering the rare cases as well. > > I ran Dacapo and only ever triggered case 1 (never the rare cases). As we discussed offline, you likely triggered the other cases when artificially reducing the static register mask size to stress the changeset? I don't think the ordering of the rare cases matter much in practice. Fair enough, thanks for double-checking! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2031100132 From chagedorn at openjdk.org Mon Apr 7 12:42:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 12:42:57 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal In-Reply-To: References: Message-ID: <33eWiJcoMJMNOS5Cn6QPgg1gUzUF4UqA_q7QPfOwSt4=.4d7cb54b-752d-4a41-baa2-527850b9c18b@github.com> On Mon, 7 Apr 2025 11:42:05 GMT, Roberto Casta?eda Lozano wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > src/hotspot/share/opto/cfgnode.cpp line 2580: > >> 2578: // IGVN iteration. We have put the Phi nodes on the IGVN worklist, so >> 2579: // they are transformed later on in any case. >> 2580: hook->destruct(igvn); > > Do you need this `hook` node to preserve `new_base` now that you are not calling `PhaseGVN::transform` anymore? We then probably also do not need the code directly above anymore added by [JDK-8275326](https://bugs.openjdk.org/browse/JDK-8275326) which was only required due to this eager phi transformation. Could you find out why this eager transformation was added in the first place? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031153701 From chagedorn at openjdk.org Mon Apr 7 12:42:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 12:42:56 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:25:48 GMT, Daniel Lund?n wrote: > After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. > > ### Changeset > > Changes: > - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. > - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. > - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing > - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. > - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. > > ### Additional issue investigation > > For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer ... Nice summary! Two comments, otherwise, it looks good to me. test/hotspot/jtreg/TEST.groups line 187: > 185: compiler/interpreter/ \ > 186: compiler/jvmci/ \ > 187: compiler/itergvn/ \ I suggest to use the short name `igvn` On a separate note, I think we should go through all our test folders in `jtreg/compiler` and check if we should add more folders to tier1. For example, `splitif` or `predicates` tests are currently not executed in tier1 but they probably should. ------------- PR Review: https://git.openjdk.org/jdk/pull/24325#pullrequestreview-2746666793 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2031150249 From avoitylov at openjdk.org Mon Apr 7 12:44:35 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 7 Apr 2025 12:44:35 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on some hardware Message-ID: The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. ? Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. ? This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. ------------- Commit messages: - account for 835769 errata on Cortex A53 in VectorizedHashCode intrinsic Changes: https://git.openjdk.org/jdk/pull/24489/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24489&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353237 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24489/head:pull/24489 PR: https://git.openjdk.org/jdk/pull/24489 From dfenacci at openjdk.org Mon Apr 7 13:36:08 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 7 Apr 2025 13:36:08 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" In-Reply-To: References: Message-ID: <97CGSRb-aRoDvF7iGFH58n9Rh20X9Sj56fEVwcVeSMw=.9adeb5b8-c129-48d2-944d-6306fd73903c@github.com> On Sun, 6 Apr 2025 04:04:56 GMT, Cesar Soares Lucas wrote: > The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. > > I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. > > Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. > > I tested this with JTREG Tier 1-3 on Linux x86_64. Thanks a lot for the fix @JohnTortugo! test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationAndSetTypeTwice.java line 32: > 30: * -Xcomp compiler.escapeAnalysis.TestReduceAllocationAndSetTypeTwice > 31: * @run main compiler.escapeAnalysis.TestReduceAllocationAndSetTypeTwice > 32: */ I quickly tried to trigger the crash with this regression test (with macosx-aarch64 and linux-x64 debug versions) but wasn't able to. On the other hand it crashed with the Test file attached to the JBS issue. Did you try on a specific architecture? Is there possibly something missing? ------------- PR Review: https://git.openjdk.org/jdk/pull/24471#pullrequestreview-2746758933 PR Review Comment: https://git.openjdk.org/jdk/pull/24471#discussion_r2031205253 From lucy at openjdk.org Mon Apr 7 13:53:01 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 7 Apr 2025 13:53:01 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:44:07 GMT, Amit Kumar wrote: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/assembler_s390.inline.hpp line 417: > 415: } > 416: inline void Assembler::z_risbg( Register r1, Register r2, int64_t spos3, int64_t epos4, int64_t nrot5, bool zero_rest) { // Rotate then INS selected bits. -- z196 > 417: const int64_t len = 48; Changes are not necessary if `bool zero_rest` is used to control what happens to untouched destination bits. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1496: > 1494: __ z_bre(L_Tail); > 1495: > 1496: __ align(16); // loop alignment align(32) would be more helpful: - instruction engine fetches octoword (32 bytes) bundles. - Tight loop is < 32 byes -> all in one bundle, does not cross cache line boundary. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1541: > 1539: > 1540: __ z_nill(rScratch1, 7); > 1541: __ z_bre(L_fill8Bytes); // branch if 0 Pls use z_braz() to reflect check semantics src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1545: > 1543: > 1544: __ z_nill(rScratch1, 3); > 1545: __ z_bre(L_fill4Bytes); // branch if 0 See above src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1548: > 1546: > 1547: __ z_nill(rScratch1, 1); > 1548: __ z_brne(L_fillBytes); // branch if not 0 Pls use z_brnaz() to reflect check semantics src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1557: > 1555: do_setmemory_atomic_loop(2, dest, size, byteVal, _masm); > 1556: > 1557: __ align(16); What is this alignment good for? ------------- PR Review: https://git.openjdk.org/jdk/pull/24480#pullrequestreview-2746874042 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031282057 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031271721 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031276482 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031277491 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031278295 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031273859 From fjiang at openjdk.org Mon Apr 7 14:26:49 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 7 Apr 2025 14:26:49 GMT Subject: RFR: 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn In-Reply-To: References: Message-ID: <6kInqCoKqHpXirpoAFhkhUfs3Q7ql0ta0AJkgEiyPHY=.3b66058a-e0d5-4865-a8b9-edeae44b27ac@github.com> On Fri, 4 Apr 2025 02:38:57 GMT, Fei Yang wrote: > Hi, please review this small change fixing two jtreg tests. > This issue menifests after https://github.com/openjdk/jdk/pull/24344 which auto detests and enables Zvkn extension. > The two tests only considers "aes" feature string `!(vm.cpu.features ~= ".*aes.*")`. But the feature string is "zvkn" for linux-riscv64 platform. > This adapts "@requires" of both tests considering the Zvkn feature of this platform. Both tests works as expected with qemu-system which is equipped with the Zvkn extension. Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24433#pullrequestreview-2747031194 From yzheng at openjdk.org Mon Apr 7 14:27:30 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 7 Apr 2025 14:27:30 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register Message-ID: Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid backing up full width of these registers. ------------- Commit messages: - [JVMCI] Allow specifying storage kind of the callee save register Changes: https://git.openjdk.org/jdk/pull/24451/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353735 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24451/head:pull/24451 PR: https://git.openjdk.org/jdk/pull/24451 From mli at openjdk.org Mon Apr 7 14:44:49 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Apr 2025 14:44:49 GMT Subject: RFR: 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 02:38:57 GMT, Fei Yang wrote: > Hi, please review this small change fixing two jtreg tests. > This issue menifests after https://github.com/openjdk/jdk/pull/24344 which auto detests and enables Zvkn extension. > The two tests only consider `aes` feature string, like `!(vm.cpu.features ~= ".*aes.*")`. But the feature string is `zvkn` for linux-riscv64 platform. > This adapts "@requires" of both tests considering the Zvkn feature of this platform. Both tests works as expected with qemu-system equipped with the Zvkn extension. Looks good! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24433#pullrequestreview-2747100140 From dnsimon at openjdk.org Mon Apr 7 14:49:37 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 7 Apr 2025 14:49:37 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 14:47:39 GMT, Yudi Zheng wrote: > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid backing up full width of these registers. Marked as reviewed by dnsimon (Reviewer). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterConfig.java line 98: > 96: > 97: /** > 98: * Gets the storage kind for a callee save register. I would add a second sentence describing the Window ABI example so that it's clear why this API exists. ------------- PR Review: https://git.openjdk.org/jdk/pull/24451#pullrequestreview-2747108590 PR Review Comment: https://git.openjdk.org/jdk/pull/24451#discussion_r2031411734 From cushon at openjdk.org Mon Apr 7 15:08:01 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 7 Apr 2025 15:08:01 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v8] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 07:01:57 GMT, Christian Hagedorn wrote: >> Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - Add -XX:+UnlockDiagnosticVMOptions >> - Check type before uncasting >> >> A child phi node may transition from con to non-con, making the AND node transition back from "0" to its current type. If that current type is still TOP we're in violation of monotonicity. Therefore, don't apply optimization if AND is not integer yet. >> - Merge commit '9bb804b14e1' into JDK-8350563 >> - Explicitly check for OP_Con instead of TypeInteger::is_con. >> >> 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) >> >> While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. >> - Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java >> >> Co-authored-by: Christian Hagedorn >> - Merge remote-tracking branch 'origin/master' into JDK-8350563 >> - Reformat test and update package to ccp >> - Review comments >> - Update test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java >> >> Co-authored-by: Emanuel Peter >> - copyright >> - ... and 5 more: https://git.openjdk.org/jdk/compare/5f51fee4...23119e18 > > test/hotspot/jtreg/compiler/ccp/TestAndConZeroMonotonic.java line 35: > >> 33: public class TestAndConZeroMonotonic { >> 34: >> 35: public static void main(String[] args) { > > Since it's quite an easy test, I suggest to merge the two test files together by calling `Integer::parseInt()` directly instead. You can just add another `@run` statement. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2031450257 From cushon at openjdk.org Mon Apr 7 15:07:57 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 7 Apr 2025 15:07:57 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v9] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Merge tests together ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/23119e18..99134a01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=07-08 Stats: 41 lines in 2 files changed: 2 ins; 38 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From dlunden at openjdk.org Mon Apr 7 15:18:46 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 7 Apr 2025 15:18:46 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Update comments - Revise TestNestedSynchronize to make use of CompileFramework ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/74357621..b20d14e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=14-15 Stats: 10510 lines in 3 files changed: 5 ins; 10416 del; 89 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Mon Apr 7 15:18:50 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 7 Apr 2025 15:18:50 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> References: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> Message-ID: On Mon, 7 Apr 2025 07:03:19 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise overlap comments for frequency of cases > > src/hotspot/share/opto/regmask.hpp line 40: > >> 38: // stack slots used by BoxLockNodes. We reach this limit by, e.g., deeply >> 39: // nesting synchronized statements in Java. >> 40: const int BoxLockNode_slot_limit = 200; > > Where does this number come from? I've added arbitrary constants like this, and sometimes it is hard to give a good justification. But at least writing down what was your thinking might help someone else if they come across it later. Do you have a sense how large it should be at least or at most? It is indeed arbitrary (and should be very generous for all practical cases). We need a limit so that we can compute an upper bound for register mask sizes. I've updated the comment now, does it make more sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2031473554 From dlunden at openjdk.org Mon Apr 7 15:18:48 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 7 Apr 2025 15:18:48 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> References: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> Message-ID: <9SUBC0Jac2TTl1CPnv4EYO1WEJEqzIB_HNWwnuCnSwk=.add69ea8-625f-435d-9f78-11c238dd6d10@github.com> On Mon, 7 Apr 2025 06:52:01 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update comments >> - Revise TestNestedSynchronize to make use of CompileFramework > > src/hotspot/share/adlc/formsopt.cpp line 180: > >> 178: // in the register mask regardless of how much slack is created by rounding. >> 179: // This was found necessary after adding 16 new registers for APX. >> 180: return (words_for_regs + 3 + 1 + 1) & ~1; > > Is the comment above still accurate? Specifically this: > > // on the stack (stack registers) up to some interesting limit. Methods > // that need more parameters will NOT be compiled. On Intel, the limit > // is something like 90+ parameters. > > And if not: might there be other comments around like this? Thanks, good catch. Now fixed and I also searched for "NOT be compiled" and found no other relevant occurrences. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2031470061 From roland at openjdk.org Mon Apr 7 15:20:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Apr 2025 15:20:16 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 06:49:50 GMT, Marc Chevalier wrote: >> If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. >> >> In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. >> >> This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. >> >> The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. >> >> Tested with tier1..3, hs-precheckin-comp and hs-comp-stress >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > not reinventing the wheel Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24245#pullrequestreview-2747218926 From dlunden at openjdk.org Mon Apr 7 15:22:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 7 Apr 2025 15:22:14 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> References: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> Message-ID: On Mon, 7 Apr 2025 07:08:25 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise overlap comments for frequency of cases > > src/hotspot/share/opto/regmask.hpp line 86: > >> 84: (((RM_SIZE_MIN << 5) + // Slots for machine registers >> 85: (max_method_parameter_length * 2) + // Slots for incoming arguments >> 86: (max_method_parameter_length * 2) + // Slots for outgoing arguments > > Why `*2`? Is that for 64 bit arguments that are split into two 32 bit words? According to the JVM spec, a method can have at most 255 (32-bit) `int` parameters, and `long` and `double` contribute two parameter slots. For some reason, C2 64-bit aligns even 32-bit parameters, leading to this `*2`. Could be worth looking into in a future RFE, but I'm not touching that in this changeset ?. > src/hotspot/share/opto/regmask.hpp line 104: > >> 102: // the machine registers and usually all parameters that need to be passed >> 103: // on the stack (stack registers) up to some interesting limit. On Intel, >> 104: // the limit is something like 90+ parameters. > > Here you fixed the comment, so probably the other one needs to be fixed too ;) Yes, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2031481585 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2031482227 From chagedorn at openjdk.org Mon Apr 7 15:24:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Apr 2025 15:24:06 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v9] In-Reply-To: References: Message-ID: <9zLCW8rlKK9rizFfNZYUhqZuyyJ0RQVXaTgs6s1o8hM=.515c73a8-5ce4-441c-8930-e9d59d3cbcdc@github.com> On Mon, 7 Apr 2025 15:07:57 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Merge tests together Thanks for the merge! I'll submit some testing and report back once it's finished. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23871#pullrequestreview-2747235000 From roland at openjdk.org Mon Apr 7 15:25:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Apr 2025 15:25:11 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 06:52:13 GMT, Christian Hagedorn wrote: >> if `new_type` is top? >> As node's types are widen by CCP, a node `n` will initially be `top`, then one input changes and becomes not `top` but if the node has another input (say control), that other input will still be `top` so the type will be `top` again. Only once both inputs are not `top` is the type not `top`. So isn't there a good chance that most type nodes will initially be `top` and be enqueued anyway so filtering nodes when they are popped is still required and we don't gain much by doing what you suggest? > > That's true, for most nodes it probably does not matter. But I'm thinking about `Phi` nodes which could already be updated when one path is non-top. So, it might still be worth to do it after `Value()` only? IIUC, it does not matter from a correctness point of view if enqueue before or after `Value()` - we would still filter later for `top` either way. What's your concern here? Is it that the list of nodes grows too big? Or that it's a waste of time to go over the list when analysis is over only to filter out non top nodes? I suppose we could push nodes when their type is top and pop them when their type becomes not top during the analysis so, once analysis is over, the list would only contain nodes whose type is top. Do you think that would be better? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2031488107 From dlunden at openjdk.org Mon Apr 7 15:26:02 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 7 Apr 2025 15:26:02 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: <2I8Y-90VKS2yfosUHwm8-pHgacfrUX7APSBcGzJgYpY=.910f347e-ad48-48d3-907a-6a79a0d931bd@github.com> On Fri, 4 Apr 2025 12:49:44 GMT, Daniel Lund?n wrote: > > `TestNestedSynchronize.java` is a massive file. I wonder if you can try to generate it using `MethodHandle` or classfile API instead? > > Indeed, the plan is to migrate it to the [template-based testing framework](https://github.com/openjdk/jdk/pull/24217) when that is ready. I'll have a look at using `MethodHandle`s, it would be nice to not even pollute the git history with the current version. @merykitty Now updated to use the `CompileFramework` instead. Thanks @eme64 for the pointer! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2783716216 From roland at openjdk.org Mon Apr 7 15:36:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Apr 2025 15:36:27 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 06:44:54 GMT, Christian Hagedorn wrote: >> I agree that we would need to be consistent and that it makes little sense to add null checks in code that have been around forever and has never caused issues. Maybe we can somehow have igvn itself assert that every node it processes has a set of expected inputs non null? I suppose, every node type would need to define which of its inputs can be null, then. > > Yes, that would be a nice verification. Should we file an RFE for that? I filed https://bugs.openjdk.org/browse/JDK-8353895 for that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2031509907 From cslucas at openjdk.org Mon Apr 7 16:23:53 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 7 Apr 2025 16:23:53 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" In-Reply-To: <97CGSRb-aRoDvF7iGFH58n9Rh20X9Sj56fEVwcVeSMw=.9adeb5b8-c129-48d2-944d-6306fd73903c@github.com> References: <97CGSRb-aRoDvF7iGFH58n9Rh20X9Sj56fEVwcVeSMw=.9adeb5b8-c129-48d2-944d-6306fd73903c@github.com> Message-ID: On Mon, 7 Apr 2025 13:07:18 GMT, Damon Fenacci wrote: >> The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. >> >> I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. >> >> Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. >> >> I tested this with JTREG Tier 1-3 on Linux x86_64. > > test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationAndSetTypeTwice.java line 32: > >> 30: * -Xcomp compiler.escapeAnalysis.TestReduceAllocationAndSetTypeTwice >> 31: * @run main compiler.escapeAnalysis.TestReduceAllocationAndSetTypeTwice >> 32: */ > > I quickly tried to trigger the crash with this regression test (with macosx-aarch64 and linux-x64 debug versions) but wasn't able to. On the other hand it crashed with the Test file attached to the JBS issue. Did you try on a specific architecture? Is there possibly something missing? Good catch. Thanks. It's missing a `-XX:CompileCommand=dontinline,*TestReduceAllocationAndSetTypeTwice*::*` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24471#discussion_r2031594231 From cslucas at openjdk.org Mon Apr 7 16:54:47 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 7 Apr 2025 16:54:47 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" [v2] In-Reply-To: References: Message-ID: > The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. > > I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. > > Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. > > I tested this with JTREG Tier 1-3 on Linux x86_64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix typo & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24471/files - new: https://git.openjdk.org/jdk/pull/24471/files/d6a2ad0a..fa6b678e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24471&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24471&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24471/head:pull/24471 PR: https://git.openjdk.org/jdk/pull/24471 From kvn at openjdk.org Mon Apr 7 17:05:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 7 Apr 2025 17:05:59 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Few questions? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 60: > 58: } > 59: > 60: public static class X64 { Should we create `src/jdk.incubator.vector/cpu/` for CPU specific information? As separate refactoring. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 100: > 98: > 99: /** > 100: * Naming convention in SVML vector math library. Does this library has code for all AVX configurations? ------------- PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2747510895 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2031654383 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2031657213 From kvn at openjdk.org Mon Apr 7 17:10:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 7 Apr 2025 17:10:00 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:47:19 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Cleanup ADLC as well > - Revert some accidental removals > - Merge branch 'master' into JDK-8353192-x86-c2-backend > - Touchup > - Fix Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24300#pullrequestreview-2747524683 From liach at openjdk.org Mon Apr 7 17:46:51 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 7 Apr 2025 17:46:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 44: > 42: String featuresString = VectorSupport.getCPUFeatures(); > 43: debug(featuresString); > 44: String[] features = featuresString.toLowerCase().split(", "); // ", " is used as a delimiter Please use `toLowerCase(Locale.ROOT)`: if the system locale is turkish, `I` and dotless i are two letters, and the dotless i will fail in the subsequent `validateFeatures` assertion. Same for `hasFeature`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2031714743 From vlivanov at openjdk.org Mon Apr 7 18:38:37 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 18:38:37 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 09:11:44 GMT, Damon Fenacci wrote: >> This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). >> >> There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: >> https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 >> While running IGVN this could be misinterpreted as non-MH late-inline >> https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 >> eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` >> >> The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: >> https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 >> >> ### Testing >> >> Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8652963: review fix Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24402#pullrequestreview-2747707141 From cslucas at openjdk.org Mon Apr 7 19:04:15 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 7 Apr 2025 19:04:15 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register In-Reply-To: References: Message-ID: <4o1jkl6NOwwVZ0--oMiBOOLnRzE0OIO6JazFL_gV4UU=.7d25433e-7507-49f0-afab-091bb7f2305d@github.com> On Fri, 4 Apr 2025 14:47:39 GMT, Yudi Zheng wrote: > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. LGTM ------------- Marked as reviewed by cslucas (Author). PR Review: https://git.openjdk.org/jdk/pull/24451#pullrequestreview-2747807421 From mdoerr at openjdk.org Mon Apr 7 20:12:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 20:12:12 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: <-cahQzTAoakWd7Nlc7UdIwq0fwsxe2v6DCQw8NO41Rg=.89214c70-c5c6-4eea-9a8e-fba0bb2e9123@github.com> On Mon, 7 Apr 2025 08:44:07 GMT, Amit Kumar wrote: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1500: > 1498: __ store_sized_value(byteVal, Address(dest, 0), elem_size); > 1499: __ store_sized_value(byteVal, Address(dest, elem_size), elem_size); > 1500: __ z_agfi(dest, 2 * elem_size); Why not aghi? src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1529: > 1527: > 1528: { > 1529: NearLabel L_fill8Bytes, L_fill4Bytes, L_fillBytes, L_exit; Unused Label: L_exit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031939138 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2031936343 From mdoerr at openjdk.org Mon Apr 7 20:16:11 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 20:16:11 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:44:07 GMT, Amit Kumar wrote: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... Since this is taken from https://github.com/openjdk/jdk/pull/24254: Maybe you can review that one, too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2784516130 From lucy at openjdk.org Mon Apr 7 20:45:10 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 7 Apr 2025 20:45:10 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 16:36:10 GMT, Martin Doerr wrote: >> Similar to the x86 implementation. The non-product feature for counting things like `SharedRuntime::_unsafe_set_memory_ctr` is currently not supported on PPC64. I've left it commented out. >> >> Before this patch (measured on Power10): >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op >> MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op >> MemorySegmentZeroUnsaf... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Simplify usage of UnsafeMemoryAccessMark. LGTM ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24254#pullrequestreview-2748064496 From mdoerr at openjdk.org Mon Apr 7 20:54:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 20:54:17 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 16:36:10 GMT, Martin Doerr wrote: >> Similar to the x86 implementation. The non-product feature for counting things like `SharedRuntime::_unsafe_set_memory_ctr` is currently not supported on PPC64. I've left it commented out. >> >> Before this patch (measured on Power10): >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op >> MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op >> MemorySegmentZeroUnsaf... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Simplify usage of UnsafeMemoryAccessMark. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24254#issuecomment-2784594956 From mdoerr at openjdk.org Mon Apr 7 20:54:18 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 20:54:18 GMT Subject: Integrated: 8352972: PPC64: Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 14:15:51 GMT, Martin Doerr wrote: > Similar to the x86 implementation. The non-product feature for counting things like `SharedRuntime::_unsafe_set_memory_ctr` is currently not supported on PPC64. I've left it commented out. > > Before this patch (measured on Power10): > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op > MemorySegmentZeroUnsafe.panama false 64 avgt 30 86.513 ? 0.623 ns/op > ... This pull request has now been integrated. Changeset: e266eba4 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/e266eba40131bb97c392c8c87551d28e74c4764a Stats: 100 lines in 1 file changed: 100 ins; 0 del; 0 mod 8352972: PPC64: Intrinsify Unsafe::setMemory Reviewed-by: lucy ------------- PR: https://git.openjdk.org/jdk/pull/24254 From vlivanov at openjdk.org Mon Apr 7 21:37:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 21:37:33 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v2] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Reviews and Float64Vector-related fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/fc27aee5..368b943e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=00-01 Stats: 22 lines in 2 files changed: 5 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From liach at openjdk.org Mon Apr 7 22:24:29 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 7 Apr 2025 22:24:29 GMT Subject: RFR: 8353841: [jittester] Fix JITTester build after asm removal In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 11:24:13 GMT, Evgeny Nikitin wrote: > [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. > > This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. > Testing: Local runs of targets `COMPILE` and `all`, no errors found. test/hotspot/jtreg/testlibrary/jittester/Makefile line 105: > 103: > 104: compile_testlib: INIT > 105: $(JAVAC) -XDignore.symbol.file --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/org.objectweb.asm=ALL-UNNAMED -Xlint $(TESTLIBRARY_SRC_FILES) -d $(CLASSES_DIR) Suggestion: $(JAVAC) -XDignore.symbol.file --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -Xlint $(TESTLIBRARY_SRC_FILES) -d $(CLASSES_DIR) As you did so below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24487#discussion_r2032089849 From vlivanov at openjdk.org Mon Apr 7 23:03:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:03:03 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: features_string -> cpu_info_string ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/368b943e..9a8f6200 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=01-02 Stats: 26 lines in 8 files changed: 1 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Mon Apr 7 23:25:26 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:25:26 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 17:44:33 GMT, Chen Liang wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> features_string -> cpu_info_string > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 44: > >> 42: String featuresString = VectorSupport.getCPUFeatures(); >> 43: debug(featuresString); >> 44: String[] features = featuresString.toLowerCase().split(", "); // ", " is used as a delimiter > > Please use `toLowerCase(Locale.ROOT)`: if the system locale is turkish, `I` and dotless i are two letters, and the dotless i will fail in the subsequent `validateFeatures` assertion. Same for `hasFeature`. Good point. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032135321 From vlivanov at openjdk.org Mon Apr 7 23:25:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:25:27 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 17:01:19 GMT, Vladimir Kozlov wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> features_string -> cpu_info_string > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 60: > >> 58: } >> 59: >> 60: public static class X64 { > > Should we create `src/jdk.incubator.vector/cpu/` for CPU specific information? As separate refactoring. To clarify: are you suggesting to move platform-specific classes into a separate package or platform-specific location? It does make sense to separate platform-specific parts into their own classes once amount of code grows over some limit. For now it doesn't look too attractive since amount of code is very small. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 100: > >> 98: >> 99: /** >> 100: * Naming convention in SVML vector math library. > > Does this library has code for all AVX configurations? Yes, there are 4 configurations (`-XX:UseAVX=[0..3]`) in total covered by SVML library. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032132478 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032134903 From vlivanov at openjdk.org Mon Apr 7 23:25:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:25:27 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: <9BCE8xN6SA-cPEc1EtuSsqoYwsHiwp31lJKsraWgYso=.67a97434-ef3c-40ab-b5be-841889fdd97c@github.com> References: <9BCE8xN6SA-cPEc1EtuSsqoYwsHiwp31lJKsraWgYso=.67a97434-ef3c-40ab-b5be-841889fdd97c@github.com> Message-ID: On Mon, 7 Apr 2025 06:44:16 GMT, Per Minborg wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> features_string -> cpu_info_string > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 258: > >> 256: if (LIBRARY.isSupported(op, vspecies)) { >> 257: String symbol = LIBRARY.symbolName(op, vspecies); >> 258: MemorySegment addr = LOOKUP.find(symbol) > > It is better to use `LOOKUP.findOrThrow()` because it does not require lambda creation. Thanks, changed as you suggested. I introduced a try-catch block instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032138430 From vlivanov at openjdk.org Mon Apr 7 23:32:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:32:06 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 23:03:03 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > features_string -> cpu_info_string In addition to addressing review feedback, there are 2 updates: * SVML: I overlooked that 64-bit vectors are covered by original implementation; fixed now; * JVM: `features_string` to `cpu_info_string` renaming uniformly across all platforms ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2784850623 From vlivanov at openjdk.org Mon Apr 7 23:32:05 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:32:05 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v4] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Fix windows-aarch64 build failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/9a8f6200..bb1a11db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Tue Apr 8 00:49:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 00:49:14 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 06:51:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > make the flag diagnostic src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 860: > 858: movl(c_rarg1, type->_lo); > 859: movl(c_rarg2, type->_hi); > 860: call(RuntimeAddress(CAST_FROM_FN_PTR(address, abort_checked_cast_int))); Do you need to align stack pointer before the call? It may be the reason why stack traces aren't printed. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 867: > 865: > 866: static void abort_checked_cast_long(jlong val, jlong lo, jlong hi) { > 867: fatal("Invalid CastLL, val: %lld, lo: %lld, hi: %lld", (long long)val, (long long)lo, (long long)hi); There's `JLONG_FORMAT` to pretty-print jlongs. src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 47: > 45: void fast_unlock_lightweight(Register obj, Register reg_rax, Register t, Register thread); > 46: > 47: void checked_cast_int(const TypeInt* type, Register val); There's already some ambiguity around what "checked cast" really means (think of "CastPP" vs "CheckCastPP" vs "checkcast"). Let's not make it worse. I suggest to just name it as `verify_int[_in]_range`/`verify_long[_in]_range`. Also, unpacking `type` (and pass low/upper bounds as arguments) would make code on receiving end a bit simpler. And some invariants to assert: * no constants (lo == hi); * no empty ranges (lo > hi). src/hotspot/cpu/x86/x86_64.ad line 7646: > 7644: %{ > 7645: predicate(VerifyConstraintCasts > 0 && > 7646: Assembler::is_simm32(static_cast(n)->type()->is_long()->_lo) && I suggest to extract the range check into a helper method on CastLL and call it from `castLL_checked_L32` and `castLL_checked` predicates. Also, `static_cast<...>(n)` can be replaced with `n->as_CastLL()`. test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java line 39: > 37: * @summary Run with -Xcomp to test -XX:+StressGCM -XX:VerifyConstraintCasts=2 in debug builds. > 38: * > 39: * @run main/othervm/timeout=300 -Xbatch -Xcomp -XX:+StressGCM -XX:VerifyConstraintCasts=2 compiler.c2.TestVerifyConstraintCasts You can have multiple `@run` directives under the same `@test`. No need to duplicate it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2032175201 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2032182576 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2032208822 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2032178151 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2032180851 From vlivanov at openjdk.org Tue Apr 8 00:49:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 00:49:14 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 00:22:36 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> make the flag diagnostic > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 867: > >> 865: >> 866: static void abort_checked_cast_long(jlong val, jlong lo, jlong hi) { >> 867: fatal("Invalid CastLL, val: %lld, lo: %lld, hi: %lld", (long long)val, (long long)lo, (long long)hi); > > There's `JLONG_FORMAT` to pretty-print jlongs. FTR it would be nice to include CastII/CastLL node ID to assist diagnosing the bug, but, unfortunately, there's no easy way to capture such information during matching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2032207380 From vlivanov at openjdk.org Tue Apr 8 01:32:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 01:32:25 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Sun, 6 Apr 2025 13:40:30 GMT, Johannes Graham wrote: >> It wouldn't change much, but yes. Generally this is an example where #17508 shines, as the code could be generalized to just reverse the bytes of the KnownBits structure. Whether this PR waits until then or we refactor the code afterwards isn't that important to me. > > I didn?t intend to advocate for waiting for that PR. `try_cast` is tiny and could be added independently. I have just been looking for things that could generally help with unifying TypeInt/TypeLong implementations. Personally, I'd prefer to see a unified generic version of `ReverseBytesNode::Value()` rather than multiple specializations (even templated ones). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2032232433 From vlivanov at openjdk.org Tue Apr 8 01:32:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 01:32:25 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 16:17:57 GMT, Hannes Greule wrote: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. test/hotspot/jtreg/compiler/c2/irTests/ReverseBytesConstantsTests.java line 72: > 70: @DontCompile > 71: public void assertResultI() { > 72: Asserts.assertEQ(Integer.reverseBytes(0x04030201), testI1()); Please, add more test cases (specifically, with negative constants). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2032235559 From fyang at openjdk.org Tue Apr 8 01:39:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 8 Apr 2025 01:39:57 GMT Subject: RFR: 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 02:38:57 GMT, Fei Yang wrote: > Hi, please review this small change fixing two jtreg tests. > This issue menifests after https://github.com/openjdk/jdk/pull/24344 which auto detests and enables Zvkn extension. > The two tests only consider `aes` feature string, like `!(vm.cpu.features ~= ".*aes.*")`. But the feature string is `zvkn` for linux-riscv64 platform. > This adapts "@requires" of both tests considering the Zvkn feature of this platform. Both tests works as expected with qemu-system equipped with the Zvkn extension. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24433#issuecomment-2785010532 From fyang at openjdk.org Tue Apr 8 01:39:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 8 Apr 2025 01:39:58 GMT Subject: Integrated: 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn In-Reply-To: References: Message-ID: <-S4SrWCvXTJXNO-iZP3rM_HDB4rvBN5XJt6xR_JUakA=.cc226212-c99e-40ab-b41f-8949f551a4b8@github.com> On Fri, 4 Apr 2025 02:38:57 GMT, Fei Yang wrote: > Hi, please review this small change fixing two jtreg tests. > This issue menifests after https://github.com/openjdk/jdk/pull/24344 which auto detests and enables Zvkn extension. > The two tests only consider `aes` feature string, like `!(vm.cpu.features ~= ".*aes.*")`. But the feature string is `zvkn` for linux-riscv64 platform. > This adapts "@requires" of both tests considering the Zvkn feature of this platform. Both tests works as expected with qemu-system equipped with the Zvkn extension. This pull request has now been integrated. Changeset: 80ff7b9c Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/80ff7b9c9406c7845ecb3bc40910e92ccdd23ff2 Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn Reviewed-by: fjiang, mli ------------- PR: https://git.openjdk.org/jdk/pull/24433 From duke at openjdk.org Tue Apr 8 02:12:24 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 8 Apr 2025 02:12:24 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v4] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv match rule and add test Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8329887 - RISC-V: C2: Support Zvbb Vector And-Not instruction fix match rule for format - RISC-V: C2: Support Zvbb Vector And-Not instruction add Vector And-Not match rule and tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/a15d58dc..bc233f6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=02-03 Stats: 91224 lines in 2552 files changed: 36811 ins; 45116 del; 9297 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From dhanalla at openjdk.org Tue Apr 8 05:49:43 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 05:49:43 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v8] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with two additional commits since the last revision: - update copyright year to 2025 - Modify jtreg test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/27aab6b0..fff17363 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=06-07 Stats: 23 lines in 1 file changed: 9 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From dhanalla at openjdk.org Tue Apr 8 05:49:43 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 05:49:43 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 02:23:46 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > CR comments According to the latest comments on bug JDK-8315916, two more people have reported the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2785281392 From dhanalla at openjdk.org Tue Apr 8 05:49:43 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 05:49:43 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Wed, 27 Nov 2024 08:38:53 GMT, Christian Hagedorn wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> CR comments > > test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 42: > >> 40: try { >> 41: // load the class to initialize the static object and trigger the EA >> 42: Class Class37 = Class.forName("compiler.escapeAnalysis.TestScalarizeBailout"); > > I'm still unclear why we need this line. `TestScalarizeBailout` should already be loaded since you are compiling `main()` from it. Is it possible to trigger the assert without `forName()` somehow? > > I tried to run your JTreg test but could not trigger the assert. On what platform/setup could you trigger this? Sorry for the late response on this PR. I have modified the test, as the earlier version was reproducible only with the Java binary, not with jtreg. The current version should trigger an assertion without this fix when using the jtreg tool. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2032425701 From dhanalla at openjdk.org Tue Apr 8 05:52:44 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 05:52:44 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v9] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: Modify jtreg test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/fff17363..8cb1e939 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From enikitin at openjdk.org Tue Apr 8 06:38:26 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 8 Apr 2025 06:38:26 GMT Subject: RFR: 8353841: [jittester] Fix JITTester build after asm removal [v2] In-Reply-To: References: Message-ID: > [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. > > This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. > Testing: Local runs of targets `COMPILE` and `all`, no errors found. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Simplify compile_testlib as well Co-authored-by: Chen Liang ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24487/files - new: https://git.openjdk.org/jdk/pull/24487/files/2d4619af..c831099a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24487&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24487&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24487/head:pull/24487 PR: https://git.openjdk.org/jdk/pull/24487 From thartmann at openjdk.org Tue Apr 8 06:42:20 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 06:42:20 GMT Subject: RFR: 8353841: [jittester] Fix JITTester build after asm removal [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 06:38:26 GMT, Evgeny Nikitin wrote: >> [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. >> >> This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. >> Testing: Local runs of targets `COMPILE` and `all`, no errors found. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Simplify compile_testlib as well > > Co-authored-by: Chen Liang Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24487#pullrequestreview-2748829422 From thartmann at openjdk.org Tue Apr 8 07:02:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 07:02:15 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 09:11:44 GMT, Damon Fenacci wrote: >> This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). >> >> There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: >> https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 >> While running IGVN this could be misinterpreted as non-MH late-inline >> https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 >> eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` >> >> The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: >> https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 >> >> ### Testing >> >> Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8652963: review fix Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24402#pullrequestreview-2748875101 From thartmann at openjdk.org Tue Apr 8 08:05:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 08:05:29 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 14:29:28 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make I think @rwestrel should have a look at this, since he suggested the cleanup in https://github.com/openjdk/jdk/pull/21834. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24258#issuecomment-2785582950 From thartmann at openjdk.org Tue Apr 8 08:08:18 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 08:08:18 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v3] In-Reply-To: References: Message-ID: <-U2obgSQq_IP3GCrrNKHR-pelpn2NsnJRf1N98Z4v7g=.e8411e02-1acb-49e6-a8c0-137e49861f2b@github.com> On Mon, 7 Apr 2025 12:05:11 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java > > Co-authored-by: Manuel H?ssig Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24421#pullrequestreview-2749063244 From thartmann at openjdk.org Tue Apr 8 08:12:17 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 08:12:17 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 18:20:15 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: > > ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24422#pullrequestreview-2749074037 From hgreule at openjdk.org Tue Apr 8 08:15:55 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 8 Apr 2025 08:15:55 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: Message-ID: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: add test cases with negative -> non-negative ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24382/files - new: https://git.openjdk.org/jdk/pull/24382/files/51d0a5d1..dc046768 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=00-01 Stats: 32 lines in 1 file changed: 28 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From hgreule at openjdk.org Tue Apr 8 08:15:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 8 Apr 2025 08:15:56 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 01:26:26 GMT, Vladimir Ivanov wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> add test cases with negative -> non-negative > > test/hotspot/jtreg/compiler/c2/irTests/ReverseBytesConstantsTests.java line 72: > >> 70: @DontCompile >> 71: public void assertResultI() { >> 72: Asserts.assertEQ(Integer.reverseBytes(0x04030201), testI1()); > > Please, add more test cases (specifically, with negative constants). I added cases with a leading 0x80 byte now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2032648072 From hgreule at openjdk.org Tue Apr 8 08:15:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 8 Apr 2025 08:15:56 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Sun, 6 Apr 2025 13:40:30 GMT, Johannes Graham wrote: >> It wouldn't change much, but yes. Generally this is an example where #17508 shines, as the code could be generalized to just reverse the bytes of the KnownBits structure. Whether this PR waits until then or we refactor the code afterwards isn't that important to me. > > I didn?t intend to advocate for waiting for that PR. `try_cast` is tiny and could be added independently. I have just been looking for things that could generally help with unifying TypeInt/TypeLong implementations. @j3graham I mainly want to avoid conflicts or duplicated solutions. I meant that after #17508 this code can be further generalized anyway, allowing to use the `try_cast` function then. This can obviously happen in a separate PR, if this one is integrated before. @iwanowww I'm not sure if that's possible without more duplication. We need to choose the correct byteswap implementation depending on the node's type. Please let me know if I'm missing something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2032646418 From chagedorn at openjdk.org Tue Apr 8 08:19:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Apr 2025 08:19:27 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v9] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:07:57 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Merge tests together test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java line 43: > 41: } > 42: Integer.parseInt("1"); > 43: } The test timed out in our CI. I think looping `run()` with `-Xint` is too slow (i.e. `compileonly` with `parseInt()`). I suggest to just exclude `run()` when checking for the `parseInt()` failure. Maybe something like that: Suggestion: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=300 -XX:+StressIGVN -XX:+StressCCP -Xcomp * -XX:CompileOnly=java.lang.Integer::parseInt compiler.ccp.TestAndConZeroCCP compileonly * @run main compiler.ccp.TestAndConZeroCCP */ package compiler.ccp; import java.util.Arrays; public class TestAndConZeroCCP { public static void main(String[] args) { Integer.parseInt("1"); if (args.length != 0) { return; } for (int i = 0; i < 10000; ++i) { run(); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2032651697 From thartmann at openjdk.org Tue Apr 8 08:25:11 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 08:25:11 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:51:37 GMT, Dean Long wrote: > This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. > > Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24390#pullrequestreview-2749111614 From duke at openjdk.org Tue Apr 8 08:27:23 2025 From: duke at openjdk.org (duke) Date: Tue, 8 Apr 2025 08:27:23 GMT Subject: RFR: 8353841: [jittester] Fix JITTester build after asm removal [v2] In-Reply-To: References: Message-ID: <8zQhLXizFuS9ugAUC990dduvKoGBDVgmoygf6SBhOO0=.cc21e2eb-28b0-4697-9b69-994c72b67d03@github.com> On Tue, 8 Apr 2025 06:38:26 GMT, Evgeny Nikitin wrote: >> [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. >> >> This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. >> Testing: Local runs of targets `COMPILE` and `all`, no errors found. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Simplify compile_testlib as well > > Co-authored-by: Chen Liang @lepestock Your change (at version c831099a3165fbf23869097383bac07aeb4f7a9b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24487#issuecomment-2785638511 From rcastanedalo at openjdk.org Tue Apr 8 08:30:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 08:30:32 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 08:09:39 GMT, Tobias Hartmann wrote: > Looks good to me too. Thanks for reviewing, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24422#issuecomment-2785642877 From rcastanedalo at openjdk.org Tue Apr 8 08:30:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 08:30:33 GMT Subject: Integrated: 8353669: IGV: dump OOP maps for MachSafePoint nodes In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 18:20:15 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: > > ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). This pull request has now been integrated. Changeset: fda5eecd Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/fda5eecd6717eb6e1db56be3e41b65deae6e683e Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod 8353669: IGV: dump OOP maps for MachSafePoint nodes Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24422 From rcastanedalo at openjdk.org Tue Apr 8 08:41:16 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 08:41:16 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:51:37 GMT, Dean Long wrote: > This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. > > Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. src/hotspot/share/opto/domgraph.cpp line 244: > 242: // edges swapped, rare case > 243: succ_idx = 1; > 244: } else { Could you explain how do we end up at this rare case? In my opinion, it would be preferable, if possible, to ensure edges are never swapped in the first place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24390#discussion_r2032694961 From mdoerr at openjdk.org Tue Apr 8 08:49:16 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 8 Apr 2025 08:49:16 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 13:42:03 GMT, Lutz Schmidt wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1557: > >> 1555: do_setmemory_atomic_loop(2, dest, size, byteVal, _masm); >> 1556: >> 1557: __ align(16); > > What is this alignment good for? Branch target alignment. There is no fallthrough path from before this point. Should it be 32? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2032709818 From rcastanedalo at openjdk.org Tue Apr 8 08:57:10 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 08:57:10 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:51:37 GMT, Dean Long wrote: > This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. > > Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24390#pullrequestreview-2749204166 From rcastanedalo at openjdk.org Tue Apr 8 08:57:12 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 08:57:12 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 08:39:02 GMT, Roberto Casta?eda Lozano wrote: >> This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. >> >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > > src/hotspot/share/opto/domgraph.cpp line 244: > >> 242: // edges swapped, rare case >> 243: succ_idx = 1; >> 244: } else { > > Could you explain how do we end up at this rare case? In my opinion, it would be preferable, if possible, to ensure edges are never swapped in the first place. Never mind, I found the answer in https://github.com/openjdk/jdk/pull/11481. Maybe it would be worth extending the comment with a summary of why this can happen. Suggestion: if (succ == b->_succs[1]->head()) { // Edges swapped, rare case. May happen due to an unusual matcher // traversal order for peeled infinite loops. succ_idx = 1; } else { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24390#discussion_r2032723880 From dfenacci at openjdk.org Tue Apr 8 09:48:16 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 8 Apr 2025 09:48:16 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" [v2] In-Reply-To: References: Message-ID: <2GMcPArRjm_VkOsaxymCXbl_PFSH5L1gqDWE6UyLML4=.eca6056a-2bbc-4ae2-acfd-abb3c121be40@github.com> On Mon, 7 Apr 2025 16:54:47 GMT, Cesar Soares Lucas wrote: >> The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. >> >> I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. >> >> Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. >> >> I tested this with JTREG Tier 1-3 on Linux x86_64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo & test Thanks @JohnTortugo. The fix looks good to me. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24471#pullrequestreview-2749355973 From rcastanedalo at openjdk.org Tue Apr 8 09:48:20 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 09:48:20 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: References: Message-ID: <0WR4VFMG1KmBQTShtDjr5w6JJosUBS8BIBabWhdH9Bw=.e1ad86d8-f675-47a9-abae-1d706f16ba3a@github.com> On Wed, 2 Apr 2025 19:51:37 GMT, Dean Long wrote: > Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. The only idea I can think of would be matching and asserting on the output of `-XX:+PrintCFGBlockFreq`, e.g. for the first test in `compiler/loopopts/TestPhaseCFGNeverBranchToGotoMain.java` I get the line Loop: 1 trip_count: 1000000 freq: 0 before this fix, and the line Loop: 1 trip_count: 1000000 freq: 900000 after the fix. You could assert that you expect a `freq` greater than 0 for every encountered loop, or something similar. But I guess such a test would be quite fragile. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24390#issuecomment-2785860930 From chagedorn at openjdk.org Tue Apr 8 10:06:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Apr 2025 10:06:24 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 16:54:47 GMT, Cesar Soares Lucas wrote: >> The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. >> >> I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. >> >> Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. >> >> I tested this with JTREG Tier 1-3 on Linux x86_64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo & test Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24471#pullrequestreview-2749408738 From chagedorn at openjdk.org Tue Apr 8 10:31:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Apr 2025 10:31:22 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:22:14 GMT, Roland Westrelin wrote: >> That's true, for most nodes it probably does not matter. But I'm thinking about `Phi` nodes which could already be updated when one path is non-top. So, it might still be worth to do it after `Value()` only? IIUC, it does not matter from a correctness point of view if enqueue before or after `Value()` - we would still filter later for `top` either way. > > What's your concern here? Is it that the list of nodes grows too big? Or that it's a waste of time to go over the list when analysis is over only to filter out non top nodes? > I suppose we could push nodes when their type is top and pop them when their type becomes not top during the analysis so, once analysis is over, the list would only contain nodes whose type is top. Do you think that would be better? I was just thinking that we are only interested in those nodes whose types are potentially top after `analyze()`. And thus we only would need to add them to `_type_nodes` if they are actually top after calling `Value()`. We don't need to pop them when they become non-top (probably not that efficient). We can still do the pass over the list as you have it now in `PhaseCCP::transform()` - my hope is that it's a smaller list compared to unconditionally adding `Type` nodes. I'm not expecting a significant impact doing it like that instead but it looked like an easy small improvement to do. Let me know what you think. I'm also fine with going with what you have now. Additional suggestion here: You should also guard the `push` with `KillPathsReachableByDeadTypeNode`. Maybe you can add an additional assert in the `else` path of L2068 in `PhaseCCP::transform()` that `_type_nodes` is empty in that case as a sanity check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2032892505 From pminborg at openjdk.org Tue Apr 8 11:33:54 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 8 Apr 2025 11:33:54 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v6] In-Reply-To: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> Message-ID: <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> > This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. > > There are also some changes in other classes (notably `j.l.Object`) which, if implemented, can take us four additional levels of inlining. However, there is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. > > So, we should discuss which of the proposed changes (if any), we'd like to integrate. > > Tested and passed tier1-3 Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Revert copyright year - Revert changes to Object - Merge branch 'master' into module-force-inline - Add more @ForceInline and a benchmark - Remove reformatting - Remove file - Revert change - Rename method and variable - Add @ForceInline annotations and restructure some methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23460/files - new: https://git.openjdk.org/jdk/pull/23460/files/7a2f7f89..401d9d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23460&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23460&range=04-05 Stats: 212749 lines in 4960 files changed: 98853 ins; 86637 del; 27259 mod Patch: https://git.openjdk.org/jdk/pull/23460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23460/head:pull/23460 PR: https://git.openjdk.org/jdk/pull/23460 From pminborg at openjdk.org Tue Apr 8 11:44:23 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 8 Apr 2025 11:44:23 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v6] In-Reply-To: <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> Message-ID: On Tue, 8 Apr 2025 11:33:54 GMT, Per Minborg wrote: >> This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. >> >> There are also some changes in other classes which, if implemented, can take us three additional levels of inlining. I drew a line there. There is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. >> >> I have opted not to inline the `j.l.Object` constructor in anticipation of broad impact. This currently sets the depth limit for this use case. >> >> Tested and passed tier1-3 > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Revert copyright year > - Revert changes to Object > - Merge branch 'master' into module-force-inline > - Add more @ForceInline and a benchmark > - Remove reformatting > - Remove file > - Revert change > - Rename method and variable > - Add @ForceInline annotations and restructure some methods I have reverted the changes to `Object` and so, resetting the number of required reviewers ------------- PR Comment: https://git.openjdk.org/jdk/pull/23460#issuecomment-2786158848 From pminborg at openjdk.org Tue Apr 8 12:16:17 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 8 Apr 2025 12:16:17 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v6] In-Reply-To: <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> Message-ID: <8GSYKwHUhVFL-XjxKB9SzUyiEXVcGWnxoEr6YTnZfPE=.080d6c74-505e-41c0-84ad-2fabe5356365@github.com> On Tue, 8 Apr 2025 11:33:54 GMT, Per Minborg wrote: >> This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. >> >> There are also some changes in other classes which, if implemented, can take us three additional levels of inlining. I drew a line there. There is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. >> >> I have opted not to inline the `j.l.Object` constructor in anticipation of broad impact. This currently sets the depth limit for this use case. >> >> Tested and passed tier1-3 > > Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Revert copyright year > - Revert changes to Object > - Merge branch 'master' into module-force-inline > - Add more @ForceInline and a benchmark > - Remove reformatting > - Remove file > - Revert change > - Rename method and variable > - Add @ForceInline annotations and restructure some methods Baseline: Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units FFMVarHandleInlineTest.t0_reference 2048 1024 thrpt 25 1552613.262 ? 14295.035 ops/s FFMVarHandleInlineTest.t1_level8 2048 1024 thrpt 25 1558465.228 ? 8458.874 ops/s FFMVarHandleInlineTest.t2_level9 2048 1024 thrpt 25 1542009.100 ? 10240.173 ops/s FFMVarHandleInlineTest.t3_level10 2048 1024 thrpt 25 1553407.503 ? 10834.133 ops/s FFMVarHandleInlineTest.t4_level11 2048 1024 thrpt 25 87666.558 ? 765.848 ops/s. <-- We hit the inline limit here Patch without `Object` changes: Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units FFMVarHandleInlineTest.t_level13 2048 1024 thrpt 6 72071.657 ? 1245.304 ops/s FFMVarHandleInlineTest.t_level14 2048 1024 thrpt 6 69263.088 ? 2196.423 ops/s FFMVarHandleInlineTest.t_level15 2048 1024 thrpt 6 3446.827 ? 118.659 **ops/s** Patch with `Object` changes: Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units FFMVarHandleInlineTest.t_level11 2048 1024 thrpt 6 1545991.924 ? 21206.450 ops/s FFMVarHandleInlineTest.t_level12 2048 1024 thrpt 6 1542234.193 ? 18002.511 ops/s FFMVarHandleInlineTest.t_level13 2048 1024 thrpt 6 1542601.822 ? 15041.864 ops/s FFMVarHandleInlineTest.t_level14 2048 1024 thrpt 6 179053.325 ? 2496.002 ops/s FFMVarHandleInlineTest.t_level15 2048 1024 thrpt 6 3433.861 ? 165.847 ops/s ------------- PR Comment: https://git.openjdk.org/jdk/pull/23460#issuecomment-2786236800 From pminborg at openjdk.org Tue Apr 8 12:24:41 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 8 Apr 2025 12:24:41 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v7] In-Reply-To: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> Message-ID: > This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. > > There are also some changes in other classes which, if implemented, can take us three additional levels of inlining. I drew a line there. There is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. > > Updating the `j.l.Object` constructor is crucial for the higher depths. > > Tested and passed tier1-3 Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Reintroduce Object changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23460/files - new: https://git.openjdk.org/jdk/pull/23460/files/401d9d4f..f715b2b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23460&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23460&range=05-06 Stats: 5 lines in 2 files changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23460/head:pull/23460 PR: https://git.openjdk.org/jdk/pull/23460 From roland at openjdk.org Tue Apr 8 12:34:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 12:34:01 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v9] In-Reply-To: References: Message-ID: > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - review - Merge branch 'master' into JDK-8349479 - Update src/hotspot/share/opto/node.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java Co-authored-by: Christian Hagedorn - review - review - review - Merge branch 'master' into JDK-8349479 - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Christian Hagedorn - ... and 7 more: https://git.openjdk.org/jdk/compare/5e6b0683...d9f53010 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/2abd3054..d9f53010 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=07-08 Stats: 21109 lines in 640 files changed: 14939 ins; 4574 del; 1596 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Tue Apr 8 12:41:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 12:41:18 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 10:29:01 GMT, Christian Hagedorn wrote: >> What's your concern here? Is it that the list of nodes grows too big? Or that it's a waste of time to go over the list when analysis is over only to filter out non top nodes? >> I suppose we could push nodes when their type is top and pop them when their type becomes not top during the analysis so, once analysis is over, the list would only contain nodes whose type is top. Do you think that would be better? > > I was just thinking that we are only interested in those nodes whose types are potentially top after `analyze()`. And thus we only would need to add them to `_type_nodes` if they are actually top after calling `Value()`. We don't need to pop them when they become non-top (probably not that efficient). We can still do the pass over the list as you have it now in `PhaseCCP::transform()` - my hope is that it's a smaller list compared to unconditionally adding `Type` nodes. > > I'm not expecting a significant impact doing it like that instead but it looked like an easy small improvement to do. Let me know what you think. I'm also fine with going with what you have now. > > Additional suggestion here: > You should also guard the `push` with `KillPathsReachableByDeadTypeNode`. Maybe you can add an additional assert in the `else` path of L2068 in `PhaseCCP::transform()` that `_type_nodes` is empty in that case as a sanity check. Fair enough. I made the changes you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2033102509 From luhenry at openjdk.org Tue Apr 8 12:46:27 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Apr 2025 12:46:27 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:05:11 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java > > Co-authored-by: Manuel H?ssig Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24421#pullrequestreview-2749843280 From luhenry at openjdk.org Tue Apr 8 12:55:15 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Apr 2025 12:55:15 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v3] In-Reply-To: References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: On Thu, 3 Apr 2025 13:49:05 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. >> >> int shift = 2; >> byte b = 83; >> byte res = (byte) (b << shift | b >> -shift); // res = 76 >> // but a real left rotate of 83 should be 77 ?? >> ``` >> >> So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. >> >> A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. >> >> The vector instruction behaviour is different from java language spec, so seems there is no way to do it for now. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comment Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24414#pullrequestreview-2749870486 From mli at openjdk.org Tue Apr 8 13:00:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 8 Apr 2025 13:00:17 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v3] In-Reply-To: <9tRHq6IVCT4LYnKp_pAkVTBXBPsc1YWL8k1ooMzwPkc=.8fcff0f9-e581-4ee5-a8f6-e547bf11f293@github.com> References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> <9tRHq6IVCT4LYnKp_pAkVTBXBPsc1YWL8k1ooMzwPkc=.8fcff0f9-e581-4ee5-a8f6-e547bf11f293@github.com> Message-ID: On Mon, 7 Apr 2025 02:34:40 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> comment > > LGTM. Thanks for fixing this! Thanks for your reviews @RealFYang @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24414#issuecomment-2786353114 From roland at openjdk.org Tue Apr 8 13:00:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 13:00:32 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v9] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge branch 'master' into JDK-8341976 - review - review - Merge branch 'master' into JDK-8341976 - review - review - Merge branch 'master' into JDK-8341976 - -XX:+TraceLoopOpts fix - review - more - ... and 8 more: https://git.openjdk.org/jdk/compare/a13d9b85...347bb291 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/a76839de..347bb291 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=07-08 Stats: 21094 lines in 638 files changed: 14932 ins; 4569 del; 1593 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Tue Apr 8 13:00:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 13:00:32 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v3] In-Reply-To: <5WJPojFKKlkAcp93avTRnRQiby4ug48YNOMI34kb00M=.908ad771-1d6e-4011-a709-48f4c26391aa@github.com> References: <5WJPojFKKlkAcp93avTRnRQiby4ug48YNOMI34kb00M=.908ad771-1d6e-4011-a709-48f4c26391aa@github.com> Message-ID: On Mon, 24 Mar 2025 12:03:08 GMT, Christian Hagedorn wrote: >> That looks reasonable. I've launched some testing and results look good so far (there is quite some load at the moment - will take a bit longer to complete than usual). > >> That looks reasonable. I've launched some testing and results look good so far (there is quite some load at the moment - will take a bit longer to complete than usual). > > Testing looked good (did not cover the `TraceLoopOpts` update). This is ready to be integrated AFAICT but latest version (avec change to test command and merge) needs a re-review. @chhagedorn @dafedafe can one of you takes care of it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23465#issuecomment-2786349295 From chagedorn at openjdk.org Tue Apr 8 13:01:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Apr 2025 13:01:29 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 12:34:01 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8349479 > - Update src/hotspot/share/opto/node.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java > > Co-authored-by: Christian Hagedorn > - review > - review > - review > - Merge branch 'master' into JDK-8349479 > - Update src/hotspot/share/opto/convertnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/convertnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 7 more: https://git.openjdk.org/jdk/compare/2a6f3390...d9f53010 That looks good to me, thanks for all the updates! I'll give this another spin in our testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23468#pullrequestreview-2749891059 From mli at openjdk.org Tue Apr 8 13:02:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 8 Apr 2025 13:02:25 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: <21-6LawlqLpcyRjrWyFe6eIg_mxgYAqYG6s_hWqI7Og=.e3336911-6ba0-4d7c-8431-0bbd1473912c@github.com> On Fri, 4 Apr 2025 13:19:01 GMT, Manuel H?ssig wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Filed [JDK-8353730](https://bugs.openjdk.org/browse/JDK-8353730). I will first fix the test and then investigate the additional `SubF`. Thanks for your reviews @mhaessig @TobiHartmann @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2786355103 From mli at openjdk.org Tue Apr 8 13:02:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 8 Apr 2025 13:02:26 GMT Subject: Integrated: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: <3A5058O1FDCJ4W-LlXOut7jI-PUJsYeAeziVFZFKs5E=.e6d896d1-9481-4735-98f6-1c6351157893@github.com> On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks This pull request has now been integrated. Changeset: 21db0fdb Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/21db0fdbfb019b9a7c6613e190ad457278f29582 Stats: 11 lines in 2 files changed: 7 ins; 2 del; 2 mod 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java Reviewed-by: thartmann, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/24421 From mli at openjdk.org Tue Apr 8 13:03:19 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 8 Apr 2025 13:03:19 GMT Subject: Integrated: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb In-Reply-To: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: On Thu, 3 Apr 2025 12:27:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. > > int shift = 2; > byte b = 83; > byte res = (byte) (b << shift | b >> -shift); // res = 76 > // but a real left rotate of 83 should be 77 ?? > ``` > > So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. > > A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. > > The vector instruction behaviour is different from java language spec, so seems there is no way to do it for now. > > Thanks! This pull request has now been integrated. Changeset: cc5e9388 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/cc5e9388d8c55178fd32eabce0f24d5ab8e76fdd Stats: 71 lines in 2 files changed: 25 ins; 41 del; 5 mod 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/24414 From dfenacci at openjdk.org Tue Apr 8 13:08:36 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 8 Apr 2025 13:08:36 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 18:23:04 GMT, Vladimir Ivanov wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8652963: review fix > > Looks good. Thank you for your reviews @iwanowww and @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24402#issuecomment-2786371677 From dfenacci at openjdk.org Tue Apr 8 13:08:36 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 8 Apr 2025 13:08:36 GMT Subject: Integrated: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: <6FtPsKEmxuXru_K-pF5A8Iey-JM1PDG_goLfrDjEi9M=.24b46fae-8216-4e50-b889-4e101b04aff0@github.com> On Thu, 3 Apr 2025 06:46:23 GMT, Damon Fenacci wrote: > This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). > > There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: > https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 > While running IGVN this could be misinterpreted as non-MH late-inline > https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 > eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` > > The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: > https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 > > ### Testing > > Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) This pull request has now been integrated. Changeset: d9f2e692 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/d9f2e6921558b4919889d81871b699971fb4f3ba Stats: 101 lines in 7 files changed: 45 ins; 3 del; 53 mod 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24402 From roland at openjdk.org Tue Apr 8 13:14:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 13:14:25 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 14:29:28 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 223: > 221: MergeMemNode* mm = opt_access.mem(); > 222: PhaseGVN& gvn = opt_access.gvn(); > 223: Node* mem = mm->memory_at(gvn.C->get_alias_index(access.addr().type())); Can we get rid of all uses of `access.addr().type()`? src/hotspot/share/gc/shared/c2/cardTableBarrierSetC2.cpp line 105: > 103: // stores. In theory we could relax the load from ctrl() to > 104: // no_ctrl, but that doesn't buy much latitude. > 105: Node* card_val = __ load( __ ctrl(), card_adr, TypeInt::BYTE, T_BYTE); We could asssert that `C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw`, that is that computed slice is the same as hardcoded slide. Similar asserts could be added for every location where a slice/address type is removed in this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2033149694 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2033162534 From duke at openjdk.org Tue Apr 8 13:15:50 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Tue, 8 Apr 2025 13:15:50 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v2] In-Reply-To: References: Message-ID: > Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. > > Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. > > A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: added jtreg test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24410/files - new: https://git.openjdk.org/jdk/pull/24410/files/fb407aa2..3bce2b2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=00-01 Stats: 58 lines in 1 file changed: 58 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24410/head:pull/24410 PR: https://git.openjdk.org/jdk/pull/24410 From dfenacci at openjdk.org Tue Apr 8 13:15:50 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 8 Apr 2025 13:15:50 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 10:29:08 GMT, Saranya Natarajan wrote: > Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. > > Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. > > A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. Thanks for the fix @sarannat! Do you think we could add the failing test as regression test (maybe in the already available Mod tests, without creating a new test file)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24410#issuecomment-2778046700 From thartmann at openjdk.org Tue Apr 8 13:26:24 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 13:26:24 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v5] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:48:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends IGV with live range visualization. It introduces live ranges as first-class IGV entities and displays them along with the control-flow graph in the CFG view. Visualizing liveness information should hopefully make C2's register allocator easier to understand, diagnose, debug, and enhance. >> >> Live ranges are visible in C2 phases where liveness information is available, that is, phases `Initial liveness` to `Fix up spills` at IGV print level 4 or greater. For example, running a debug build of the JVM as follows: >> >> >> java -Xbatch -XX:CompileCommand=IGVPrintLevel,java.util.HashMap::newNode,4 >> >> >> produces the following visualization for the `Initial spilling` phase: >> >> ![initial-spilling](https://github.com/user-attachments/assets/1ecf74f5-92a8-4866-b1ec-2323bb0c428e) >> >> Live ranges are first-class IGV entities, meaning that the user can: >> >> - search, select, and extract them; >> >> ![search-extract](https://github.com/user-attachments/assets/8e0dfa59-457f-49cb-b2b5-1d202301c79d) >> >> - examine their properties in the `Properties` window or via tooltips; >> >> ![properties](https://github.com/user-attachments/assets/68d2d23b-b986-4d2e-835c-b661bce0de23) >> >> - navigate to related IGV entities via a pop-up menu; and >> >> ![popup](https://github.com/user-attachments/assets/21de2fef-d36a-42d5-b828-2696d87a18ea) >> >> - program filters that act om them according to their properties. >> >> ![filters](https://github.com/user-attachments/assets/e993b067-d0b8-452c-a885-c4e601e31e1c) >> >> Live ranges are connected to nodes by a use-def relation: a node can define zero or one live ranges, and use multiple live ranges; a live range can be defined and used by multiple nodes. Consequently, a live range in IGV is visible if and only if all its related nodes are visible (fully or semi-transparently). Generally, the start and end of a live range are vertically aligned with the nodes that first define and last use the live range. To reflect accurately the semantics of Phi nodes w.r.t. liveness, the visualization treats live ranges related by Phi nodes specially: live ranges used by a Phi node end at the bottom of the corresponding predecessor basic blocks, whereas live ranges defined by a Phi node start at the top of the node's basic block. The following screenshot shows an example of a Phi node (`48 Phi`) joining live ranges `L8` and `L13` into `L15`: >> >> ![phi](https://github.com/user-attachments/assets/0ef8aa1d-523d-4391-982e-6b74c2016a3c... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Show liveness info extra line only when liveness information is available This is awesome, thanks a lot for working on this Roberto! I tested it extensively and despite some layout weirdness that we discussed about off-thread and that does not seem to be related to your change, it works well. I had a quick look at the hotspot changes and they look good to me as well. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23558#pullrequestreview-2749973167 From rcastanedalo at openjdk.org Tue Apr 8 13:34:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Apr 2025 13:34:18 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v5] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:23:41 GMT, Tobias Hartmann wrote: > This is awesome, thanks a lot for working on this Roberto! I tested it extensively and despite some layout weirdness that we discussed about off-thread and that does not seem to be related to your change, it works well. > > I had a quick look at the hotspot changes and they look good to me as well. Thank you very much for reviewing, Tobias! Will have a look at the weird layout issue and re-test before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23558#issuecomment-2786451833 From duke at openjdk.org Tue Apr 8 13:35:46 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 8 Apr 2025 13:35:46 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase Message-ID: This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. ------------- Commit messages: - Add before and after loop optimization phases Changes: https://git.openjdk.org/jdk/pull/24509/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24509&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353842 Stats: 13 lines in 3 files changed: 11 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24509/head:pull/24509 PR: https://git.openjdk.org/jdk/pull/24509 From chagedorn at openjdk.org Tue Apr 8 13:49:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Apr 2025 13:49:16 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:28:59 GMT, Manuel H?ssig wrote: > This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. > > I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. Nice additions! These can be quite helpful during debugging as well. A few comments. src/hotspot/share/opto/compile.cpp line 1862: > 1860: if (has_loops()) { > 1861: print_method(PHASE_AFTER_LOOP_OPTS, 2); > 1862: } Isn't `has_loops()` false when we fully optimized a loop away? Maybe we can cache the earlier decision to print the "before phase" and then print the "after phase" if we printed the "before phase". test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 61: > 59: ITER_GVN_BEFORE_EA("Iter GVN before EA"), > 60: ITER_GVN_AFTER_VECTOR("Iter GVN after vector box elimination"), > 61: BEFORE_LOOP_OPTS("Before loop optimizations"), To match capital first letters of (most) other phases: Suggestion: BEFORE_LOOP_OPTS("Before Loop Optimizations"), test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 99: > 97: ITER_GVN2("Iter GVN 2"), > 98: PHASEIDEALLOOP_ITERATIONS("PhaseIdealLoop iterations"), > 99: AFTER_LOOP_OPTS("After loop optimizations"), Suggestion: AFTER_LOOP_OPTS("After Loop Optimizations"), ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24509#pullrequestreview-2750030412 PR Review Comment: https://git.openjdk.org/jdk/pull/24509#discussion_r2033220837 PR Review Comment: https://git.openjdk.org/jdk/pull/24509#discussion_r2033226427 PR Review Comment: https://git.openjdk.org/jdk/pull/24509#discussion_r2033226881 From dfenacci at openjdk.org Tue Apr 8 14:00:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 8 Apr 2025 14:00:23 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:00:32 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - ... and 8 more: https://git.openjdk.org/jdk/compare/5a5494ed...347bb291 Still looks good to me (but you probably need @chhagedorn's review as well). ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2750099255 From duke at openjdk.org Tue Apr 8 14:05:45 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 8 Apr 2025 14:05:45 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:40:37 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Also print after if loops get optimized away > > src/hotspot/share/opto/compile.cpp line 1862: > >> 1860: if (has_loops()) { >> 1861: print_method(PHASE_AFTER_LOOP_OPTS, 2); >> 1862: } > > Isn't `has_loops()` false when we fully optimized a loop away? Maybe we can cache the earlier decision to print the "before phase" and then print the "after phase" if we printed the "before phase". Good catch! That is indeed the case. I fixed it in [6fd228c](https://github.com/openjdk/jdk/pull/24509/commits/6fd228c453e6508176a5a8e54f24a006ae90e91d). We can use the loop opt counter to see if a loop optimization was performed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24509#discussion_r2033274748 From duke at openjdk.org Tue Apr 8 14:05:44 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 8 Apr 2025 14:05:44 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v2] In-Reply-To: References: Message-ID: > This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. > > I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Also print after if loops get optimized away ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24509/files - new: https://git.openjdk.org/jdk/pull/24509/files/b7829e0c..6fd228c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24509&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24509&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24509/head:pull/24509 PR: https://git.openjdk.org/jdk/pull/24509 From duke at openjdk.org Tue Apr 8 14:10:42 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 8 Apr 2025 14:10:42 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v3] In-Reply-To: References: Message-ID: > This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. > > I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix capitalization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24509/files - new: https://git.openjdk.org/jdk/pull/24509/files/6fd228c4..951c516e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24509&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24509&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24509/head:pull/24509 PR: https://git.openjdk.org/jdk/pull/24509 From duke at openjdk.org Tue Apr 8 14:10:42 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 8 Apr 2025 14:10:42 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v3] In-Reply-To: References: Message-ID: <2hNMhBtX0EH_F0lC2k0TrtHqhAUBmfGQzdrnsv9c4jQ=.c0ea34aa-d27a-4b78-b147-5784f41793a8@github.com> On Tue, 8 Apr 2025 13:43:18 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix capitalization > > test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 61: > >> 59: ITER_GVN_BEFORE_EA("Iter GVN before EA"), >> 60: ITER_GVN_AFTER_VECTOR("Iter GVN after vector box elimination"), >> 61: BEFORE_LOOP_OPTS("Before loop optimizations"), > > To match capital first letters of (most) other phases: > Suggestion: > > BEFORE_LOOP_OPTS("Before Loop Optimizations"), Fixed in [951c516](https://github.com/openjdk/jdk/pull/24509/commits/951c516e60f248627770df058df07aa6a0a08b48) and also fixed in `phasetype.cpp` > test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 99: > >> 97: ITER_GVN2("Iter GVN 2"), >> 98: PHASEIDEALLOOP_ITERATIONS("PhaseIdealLoop iterations"), >> 99: AFTER_LOOP_OPTS("After loop optimizations"), > > Suggestion: > > AFTER_LOOP_OPTS("After Loop Optimizations"), Fixed in [951c516](https://github.com/openjdk/jdk/pull/24509/commits/951c516e60f248627770df058df07aa6a0a08b48) and also fixed in `phasetype.cpp` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24509#discussion_r2033283987 PR Review Comment: https://git.openjdk.org/jdk/pull/24509#discussion_r2033285053 From cushon at openjdk.org Tue Apr 8 15:23:06 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Tue, 8 Apr 2025 15:23:06 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v10] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/99134a01..4007e741 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=08-09 Stats: 8 lines in 1 file changed: 6 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From chagedorn at openjdk.org Tue Apr 8 15:54:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Apr 2025 15:54:28 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:00:32 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - ... and 8 more: https://git.openjdk.org/jdk/compare/3c42d568...347bb291 Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2750509804 From roland at openjdk.org Tue Apr 8 15:54:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 15:54:29 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v3] In-Reply-To: <5WJPojFKKlkAcp93avTRnRQiby4ug48YNOMI34kb00M=.908ad771-1d6e-4011-a709-48f4c26391aa@github.com> References: <5WJPojFKKlkAcp93avTRnRQiby4ug48YNOMI34kb00M=.908ad771-1d6e-4011-a709-48f4c26391aa@github.com> Message-ID: On Mon, 24 Mar 2025 12:03:08 GMT, Christian Hagedorn wrote: >> That looks reasonable. I've launched some testing and results look good so far (there is quite some load at the moment - will take a bit longer to complete than usual). > >> That looks reasonable. I've launched some testing and results look good so far (there is quite some load at the moment - will take a bit longer to complete than usual). > > Testing looked good (did not cover the `TraceLoopOpts` update). @chhagedorn @dafedafe thanks for the reviews and re-reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23465#issuecomment-2786907238 From roland at openjdk.org Tue Apr 8 15:54:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 15:54:30 GMT Subject: Integrated: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure In-Reply-To: References: Message-ID: <2w7wipIhkjfHnRwOryG6QmQPb0x9BgPGZpcVWUSao_Y=.a741d943-d4db-42e7-b53f-3867b4556ff7@github.com> On Wed, 5 Feb 2025 15:37:23 GMT, Roland Westrelin wrote: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. This pull request has now been integrated. Changeset: 4645ddbb Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/4645ddbb6be6b4456cc4d9f58188b0561a8e593d Stats: 127 lines in 6 files changed: 88 ins; 10 del; 29 mod 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/23465 From galder at openjdk.org Tue Apr 8 16:25:24 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 8 Apr 2025 16:25:24 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: <4G5Po8SEYFxSylfIJtndUpu0LLboJPgGgmE8FL3t1S4=.39c5d519-7a78-43c7-a1fe-8cef72901490@github.com> References: <4G5Po8SEYFxSylfIJtndUpu0LLboJPgGgmE8FL3t1S4=.39c5d519-7a78-43c7-a1fe-8cef72901490@github.com> Message-ID: On Mon, 7 Apr 2025 06:04:35 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Christian > > @turbanoff Thanks for the two whitespace fixes :) @eme64 No, I don't think my review counts that much on this one. I think you need someone to review it that is has more background on the history of this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2787002818 From epeter at openjdk.org Tue Apr 8 16:30:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 16:30:20 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on some hardware In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. @voitylov How can this bug be reproduced? Can you write a reproducer to attach to this PR? Also: the title sounds a bit generic: "on some hardware" ... is it all aarch64, or something more specific? You could change the PR title. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787017972 From epeter at openjdk.org Tue Apr 8 16:32:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 16:32:24 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 05:40:16 GMT, Dhamoder Nalla wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> CR comments > > According to the latest comments on bug JDK-8315916, two more people have reported the issue. @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2787022375 From epeter at openjdk.org Tue Apr 8 16:43:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 16:43:23 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:15:50 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > added jtreg test @sarannat Good work :) test/hotspot/jtreg/compiler/integerArithmetic/TestUnsignedModByZero.java line 50: > 48: double x = 1.0; > 49: return Long.remainderUnsigned(1, (long)(x % x)); > 50: } These tests look identical... Did you mean to use `Integer.remainderUnsigned`? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24410#pullrequestreview-2750687987 PR Review Comment: https://git.openjdk.org/jdk/pull/24410#discussion_r2033609952 From dhanalla at openjdk.org Tue Apr 8 16:44:22 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 16:44:22 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 05:40:16 GMT, Dhamoder Nalla wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> CR comments > > According to the latest comments on bug JDK-8315916, two more people have reported the issue. > @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) @eme64, This is ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2787061580 From epeter at openjdk.org Tue Apr 8 16:50:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 16:50:11 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 14:10:42 GMT, Manuel H?ssig wrote: >> This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. >> >> I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix capitalization @mhaessig Looks good to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24509#pullrequestreview-2750721929 From duke at openjdk.org Tue Apr 8 17:02:18 2025 From: duke at openjdk.org (Koutheir Attouchi) Date: Tue, 8 Apr 2025 17:02:18 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on some hardware In-Reply-To: References: Message-ID: <1ZOvfgxx-JhdjzWpOFwMd9UL7wqFwBguGbA-g_sGbIE=.e358b746-67d5-4680-9083-8938021f0d32@github.com> On Tue, 8 Apr 2025 16:27:32 GMT, Emanuel Peter wrote: > How can this bug be reproduced? Can you write a reproducer to attach to this PR? I define a Java source file like this: // A.java public class A { public static void main(String[] args) { System.out.println("HelloWorld!"); } } Then I build its class file on my computer (`javac A.java`). I transfer both files to [the Rock64 device](https://pine64.org/devices/rock64/) running Linux/AArch64, and I get the following behaviors. First, the good behaviors: # Load A.class: $ jdk-25/bin/java A HelloWorld! # Compile A.java then load it: $ jdk-25/bin/java -XX:+UnlockDiagnosticVMOptions -XX:-UseVectorizedHashCodeIntrinsic A.java HelloWorld! Then, the wrong behavior: $ jdk-25/bin/java A.java An exception has occurred in the compiler ((version info not available)). Please file a bug against the Java compiler via the Java bug reporting page (https://bugreport.java.com) after checking the Bug Database (https://bugs.java.com) for duplicates. Include your program, the following diagnostic, and the parameters passed to the Java compiler in your report. Thank you. java.lang.NoClassDefFoundError: com/sun/tools/javac/jvm/ClassReader$AttributeReader at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.initAttributeReaders(ClassReader.java:885) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.(ClassReader.java:308) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.instance(ClassReader.java:270) at jdk.compiler/com.sun.tools.javac.code.ClassFinder.(ClassFinder.java:186) at jdk.compiler/com.sun.tools.javac.code.ClassFinder.instance(ClassFinder.java:178) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.(JavaCompiler.java:400) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.instance(JavaCompiler.java:129) at jdk.compiler/com.sun.tools.javac.processing.JavacProcessingEnvironment.(JavacProcessingEnvironment.java:209) at jdk.compiler/com.sun.tools.javac.processing.JavacProcessingEnvironment.instance(JavacProcessingEnvironment.java:194) at jdk.compiler/com.sun.tools.javac.api.BasicJavacTask.initPlugins(BasicJavacTask.java:218) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.prepareCompiler(JavacTaskImpl.java:204) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.parseInternal(JavacTaskImpl.java:257) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.invocationHelper(JavacTaskImpl.java:152) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.parse(JavacTaskImpl.java:248) at jdk.compiler/com.sun.tools.javac.launcher.ProgramDescriptor.of(ProgramDescriptor.java:71) at jdk.compiler/com.sun.tools.javac.launcher.SourceLauncher.run(SourceLauncher.java:132) at jdk.compiler/com.sun.tools.javac.launcher.SourceLauncher.main(SourceLauncher.java:76) Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.jvm.ClassReader$AttributeReader at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:580) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:490) ... 17 more Exception in thread "main" java.lang.IllegalStateException: java.lang.NoClassDefFoundError: com/sun/tools/javac/jvm/ClassReader$AttributeReader at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.parse(JavacTaskImpl.java:252) at jdk.compiler/com.sun.tools.javac.launcher.ProgramDescriptor.of(ProgramDescriptor.java:71) at jdk.compiler/com.sun.tools.javac.launcher.SourceLauncher.run(SourceLauncher.java:132) at jdk.compiler/com.sun.tools.javac.launcher.SourceLauncher.main(SourceLauncher.java:76) Caused by: java.lang.NoClassDefFoundError: com/sun/tools/javac/jvm/ClassReader$AttributeReader at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.initAttributeReaders(ClassReader.java:885) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.(ClassReader.java:308) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.instance(ClassReader.java:270) at jdk.compiler/com.sun.tools.javac.code.ClassFinder.(ClassFinder.java:186) at jdk.compiler/com.sun.tools.javac.code.ClassFinder.instance(ClassFinder.java:178) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.(JavaCompiler.java:400) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.instance(JavaCompiler.java:129) at jdk.compiler/com.sun.tools.javac.processing.JavacProcessingEnvironment.(JavacProcessingEnvironment.java:209) at jdk.compiler/com.sun.tools.javac.processing.JavacProcessingEnvironment.instance(JavacProcessingEnvironment.java:194) at jdk.compiler/com.sun.tools.javac.api.BasicJavacTask.initPlugins(BasicJavacTask.java:218) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.prepareCompiler(JavacTaskImpl.java:204) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.parseInternal(JavacTaskImpl.java:257) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.invocationHelper(JavacTaskImpl.java:152) at jdk.compiler/com.sun.tools.javac.api.JavacTaskImpl.parse(JavacTaskImpl.java:248) ... 3 more Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.jvm.ClassReader$AttributeReader at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:580) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:490) ... 17 more For reference: $ jdk-25/bin/java -version openjdk version "25-ea" 2025-09-16 OpenJDK Runtime Environment (build 25-ea+16-1816) OpenJDK 64-Bit Server VM (build 25-ea+16-1816, mixed mode, sharing) $ uname -a Linux rock64-aarch64-2 6.13.5-3-aarch64-ARCH #1 SMP PREEMPT_DYNAMIC Wed Mar 5 18:09:23 MST 2025 aarch64 GNU/Linux $ cat /proc/cpuinfo processor : 0 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 1 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 2 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 3 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 $ cat /sys/devices/system/cpu/cpu0/regs/identification/revidr_el1 0x0000000000000180 # The erratum 835769 for Cortex A53 is fixed on this CPU: $ python3 >>> (0x0000000000000180 >> 7) & 1 1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787098570 From epeter at openjdk.org Tue Apr 8 17:02:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 17:02:21 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: Message-ID: <1mMtj4ed0tqiFvCAIszptustbFaIf_mqfP9o3ZGQrfw=.0f0b21bf-3c70-46cd-b142-b7868ec06536@github.com> On Tue, 8 Apr 2025 08:15:55 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > add test cases with negative -> non-negative @SirYwell This looks generally good, but I'll let you have the conversation with @iwanowww . I launched some internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2787108139 From epeter at openjdk.org Tue Apr 8 17:08:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 17:08:24 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on some hardware In-Reply-To: References: Message-ID: <0UMwRXJSGxV_nJDhJnzgck7rZ2NMmQWysb1gpm-7Jzg=.72fb5014-14a3-4e7c-87e5-d27f6ce3fade@github.com> On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. @voitylov @koutheir I am asking because I see no such reproducer attached with the PR, and that is generally required for approval of PRs ;) Or are you saying there is already a JTREG test that triggers this bug? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787121592 From duke at openjdk.org Tue Apr 8 17:12:25 2025 From: duke at openjdk.org (Koutheir Attouchi) Date: Tue, 8 Apr 2025 17:12:25 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on some hardware In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. I'm saying that a simple HelloWorld reproduces the issue every time on the Rock64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787132353 From epeter at openjdk.org Tue Apr 8 17:15:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 17:15:19 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:47:19 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Cleanup ADLC as well > - Revert some accidental removals > - Merge branch 'master' into JDK-8353192-x86-c2-backend > - Touchup > - Fix Looks reasonable :) I launched some internal testing just in case, please ping me again in 24h :) src/hotspot/cpu/x86/x86.ad line 1659: > 1657: return false; > 1658: } > 1659: break; So just `!VM_Version::supports_bmi2()` is not possible any more? ------------- PR Review: https://git.openjdk.org/jdk/pull/24300#pullrequestreview-2750807491 PR Review Comment: https://git.openjdk.org/jdk/pull/24300#discussion_r2033680072 From shade at openjdk.org Tue Apr 8 17:23:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 17:23:27 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: <_9lPpDvu8eGRw7bdl5IMOH8E5-QG9FNlzomPuTvX5A4=.9822ebed-864a-4d55-90bc-c78637248d91@github.com> On Tue, 8 Apr 2025 17:11:47 GMT, Emanuel Peter wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Cleanup ADLC as well >> - Revert some accidental removals >> - Merge branch 'master' into JDK-8353192-x86-c2-backend >> - Touchup >> - Fix > > src/hotspot/cpu/x86/x86.ad line 1659: > >> 1657: return false; >> 1658: } >> 1659: break; > > So just `!VM_Version::supports_bmi2()` is not possible any more? It is possible. I just coalesced the cases for `Op_CompressBits` and `Op_ExpandBits`, see a few lines below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24300#discussion_r2033695690 From epeter at openjdk.org Tue Apr 8 17:26:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 17:26:25 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 02:58:17 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8352675: Support Intel AVX10 converged vector ISA feature detection Just leaving a few drive-by comments, I'm really not very familiar with this code. It would be nice if someone from Intel reviewed this also. Also: you should probably update some more copyright dates ;) src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 60: > 58: Map constants, > 59: long features, > 60: long extra_features, All other arguments have a javadoc `@param` above, you should probably add one too then ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2750831677 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2033693455 From epeter at openjdk.org Tue Apr 8 17:26:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 17:26:26 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 17:18:53 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8352675: Support Intel AVX10 converged vector ISA feature detection > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 60: > >> 58: Map constants, >> 59: long features, >> 60: long extra_features, > > All other arguments have a javadoc `@param` above, you should probably add one too then ;) Ah, and is it not more "java" to use `extraFeatures`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2033698184 From epeter at openjdk.org Tue Apr 8 17:27:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Apr 2025 17:27:13 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: <_9lPpDvu8eGRw7bdl5IMOH8E5-QG9FNlzomPuTvX5A4=.9822ebed-864a-4d55-90bc-c78637248d91@github.com> References: <_9lPpDvu8eGRw7bdl5IMOH8E5-QG9FNlzomPuTvX5A4=.9822ebed-864a-4d55-90bc-c78637248d91@github.com> Message-ID: On Tue, 8 Apr 2025 17:20:12 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/x86/x86.ad line 1659: >> >>> 1657: return false; >>> 1658: } >>> 1659: break; >> >> So just `!VM_Version::supports_bmi2()` is not possible any more? > > It is possible. I just coalesced the cases for `Op_CompressBits` and `Op_ExpandBits`, see a few lines below. Aaaah right ? ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24300#discussion_r2033705441 From avoitylov at openjdk.org Tue Apr 8 17:36:27 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 8 Apr 2025 17:36:27 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. The title now reflects the concrete core affected by the issue. The description now lists a concrete jtreg test that can be used to reproduce the issue on that core, but like described previously, any hotspot jtreg test used to fail on that core before the fix. JDK 24 is basically DOA on Cortex-A53. Before: bellsoft at rpi3-0:~ $ ./jdk-24.0/bin/java -Xcomp -version openjdk version "24-internal" 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.bellsoft.jdkgit) null (build null, null) After: bellsoft at rpi3-0:~ $ ./jdk-24.0/bin/java -Xcomp -version openjdk version "24-internal" 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.bellsoft.jdkgit) OpenJDK 64-Bit Server VM (build 24-internal-adhoc.bellsoft.jdkgit, compiled mode) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787186191 PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2787190350 From duke at openjdk.org Tue Apr 8 17:50:11 2025 From: duke at openjdk.org (duke) Date: Tue, 8 Apr 2025 17:50:11 GMT Subject: RFR: 8352681: C2 compilation hits asserts "must set the initial type just once" [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 16:54:47 GMT, Cesar Soares Lucas wrote: >> The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. >> >> I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. >> >> Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. >> >> I tested this with JTREG Tier 1-3 on Linux x86_64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo & test @JohnTortugo Your change (at version fa6b678e8aef29f1be9f6f2e7c368bd24e73239a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24471#issuecomment-2787222740 From vlivanov at openjdk.org Tue Apr 8 20:00:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 20:00:13 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Tue, 8 Apr 2025 08:12:09 GMT, Hannes Greule wrote: >> I didn?t intend to advocate for waiting for that PR. `try_cast` is tiny and could be added independently. I have just been looking for things that could generally help with unifying TypeInt/TypeLong implementations. > > @j3graham I mainly want to avoid conflicts or duplicated solutions. I meant that after #17508 this code can be further generalized anyway, allowing to use the `try_cast` function then. This can obviously happen in a separate PR, if this one is integrated before. > > @iwanowww I'm not sure if that's possible without more duplication. We need to choose the correct byteswap implementation depending on the node's type. Please let me know if I'm missing something. It does have some duplication, but simply in a different place. I wouldn't say there's more of it: static const Type* reverse_bytes(int op, const Type* con) { switch (op) { case Op_ReverseBytesS: return TypeInt::make(byteswap(checked_cast(con->is_int()->get_con()))); case Op_ReverseBytesUS: return TypeInt::make(byteswap(checked_cast (con->is_int()->get_con()))); case Op_ReverseBytesI: return TypeInt::make(byteswap(con->is_int()->get_con())); case Op_ReverseBytesL: return TypeLong::make(byteswap(con->is_long()->get_con())); default: ShouldNotReachHere(); } } const Type* ReverseBytesNode::Value(PhaseGVN* phase) const { const Type* type = phase->type(in(1)); if (type == Type::TOP) { return Type::TOP; } if (type->singleton()) { return reverse_bytes(Opcode(), type); } return bottom_type(); } At some point, we should consider folding `ReverseBytes*Node` specializations into a single one (`ReverseBytesNode`) parameterized by element type (as was done for `ReverseBytesV` and other vector nodes). After it is done, there won't be a convenient way to piggyback on virtual calls to hook specialized versions forcing us to do explicit dispatch anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2033930617 From vlivanov at openjdk.org Tue Apr 8 20:00:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 20:00:14 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: Message-ID: <5kXZ7w9OIWGMearNP35fcQ3k44VApkRUKhH_CL1BVJ8=.248f3be7-773d-4ff4-8373-a564bc785155@github.com> On Tue, 8 Apr 2025 08:13:00 GMT, Hannes Greule wrote: >> test/hotspot/jtreg/compiler/c2/irTests/ReverseBytesConstantsTests.java line 72: >> >>> 70: @DontCompile >>> 71: public void assertResultI() { >>> 72: Asserts.assertEQ(Integer.reverseBytes(0x04030201), testI1()); >> >> Please, add more test cases (specifically, with negative constants). > > I added cases with a leading 0x80 byte now Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2033930819 From qamai at openjdk.org Tue Apr 8 20:15:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 8 Apr 2025 20:15:50 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v5] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 17:22:17 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into verifycast >> - better comments >> - move test to a new file, add block_comment >> - add tests >> - make VerifyConstraintCast uint, better debug info >> - Merge branch 'master' into verifycast >> - Introduce VerifyConstraintCasts > > Very nice! @iwanowww Thanks for the reviews, I hope I have addressed all of your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2787552670 From qamai at openjdk.org Tue Apr 8 20:15:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 8 Apr 2025 20:15:46 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v8] In-Reply-To: References: Message-ID: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22880/files - new: https://git.openjdk.org/jdk/pull/22880/files/bc8b6af3..bc9262ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=06-07 Stats: 68 lines in 5 files changed: 19 ins; 13 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/22880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880 PR: https://git.openjdk.org/jdk/pull/22880 From qamai at openjdk.org Tue Apr 8 20:15:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 8 Apr 2025 20:15:51 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 00:11:04 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> make the flag diagnostic > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 860: > >> 858: movl(c_rarg1, type->_lo); >> 859: movl(c_rarg2, type->_hi); >> 860: call(RuntimeAddress(CAST_FROM_FN_PTR(address, abort_checked_cast_int))); > > Do you need to align stack pointer before the call? It may be the reason why stack traces aren't printed. I believe C2 stacks are 16-byte aligned so we should not need to manually do the alignment? > src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 47: > >> 45: void fast_unlock_lightweight(Register obj, Register reg_rax, Register t, Register thread); >> 46: >> 47: void checked_cast_int(const TypeInt* type, Register val); > > There's already some ambiguity around what "checked cast" really means (think of "CastPP" vs "CheckCastPP" vs "checkcast"). Let's not make it worse. I suggest to just name it as `verify_int[_in]_range`/`verify_long[_in]_range`. > > Also, unpacking `type` (and pass low/upper bounds as arguments) would make code on receiving end a bit simpler. > > And some invariants to assert: > * no constants (lo == hi); > * no empty ranges (lo > hi). Thanks for the suggestions, I have changed this to `verify_int_in_range`. Expanding the `Type` before passing into the function would be really bad after #17508, though. > src/hotspot/cpu/x86/x86_64.ad line 7646: > >> 7644: %{ >> 7645: predicate(VerifyConstraintCasts > 0 && >> 7646: Assembler::is_simm32(static_cast(n)->type()->is_long()->_lo) && > > I suggest to extract the range check into a helper method on CastLL and call it from `castLL_checked_L32` and `castLL_checked` predicates. > > Also, `static_cast<...>(n)` can be replaced with `n->as_CastLL()`. Done, I added it to `ad_x86.hpp` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2033946419 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2033950065 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2033947292 From qamai at openjdk.org Tue Apr 8 20:15:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 8 Apr 2025 20:15:51 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 00:44:05 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 867: >> >>> 865: >>> 866: static void abort_checked_cast_long(jlong val, jlong lo, jlong hi) { >>> 867: fatal("Invalid CastLL, val: %lld, lo: %lld, hi: %lld", (long long)val, (long long)lo, (long long)hi); >> >> There's `JLONG_FORMAT` to pretty-print jlongs. > > FTR it would be nice to include CastII/CastLL node ID to assist diagnosing the bug, but, unfortunately, there's no easy way to capture such information during matching. Added printing the `MachNode`'s idx. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2033948356 From dhanalla at openjdk.org Tue Apr 8 20:21:34 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 20:21:34 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: > This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations. > > > **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.** > > **1. Initial State (Before Transformation)** > The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node. > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > **2. After Splitting Through Child Phi** > The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input. > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > **3. After Splitting Load Field Through Parent Phi** > The field load operation (Load) is pushed even further up in the graph. > > Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging. > > This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement. > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) > > ### JMH Benchmark Results: > > #### With Disabled RAM > > | Benchmark | Mode | Count | Score | Error | Units | > |-----------|------|-------|-------|-------|-------| > | testBailOut_runner | avgt | 15 | 13.969 | ? 0.248 | ms/op | > | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ? 4.306 | ms/op | > | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ? 1.781 | ms/op | > | testMultiParentPhi_runner | avgt | 15 | 2.983 | ? 0.001 | ms/op | > | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ? 0.731 | ms/op | > | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ? 0.443 | ms/op | > | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ? 1.212 | ms/op | > | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ? 1.478 | ms/op | > | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ? 0.306 | ms/op | > | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ? 0.247 | ms/op | > | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ? 4.869 | ms/op | > | testRematerialize_SingleObj_runner | avgt | 15 | 29560.130 | ? 48.797 ... Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: address CR comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21270/files - new: https://git.openjdk.org/jdk/pull/21270/files/3c56f98d..7947053b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=07-08 Stats: 8 lines in 3 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21270/head:pull/21270 PR: https://git.openjdk.org/jdk/pull/21270 From dhanalla at openjdk.org Tue Apr 8 20:28:12 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 8 Apr 2025 20:28:12 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v7] In-Reply-To: References: <18TQt6vxN9KxSVwyeQtAWde-ezaVuUEioAl_5_3sAeE=.e5e76fb6-04a7-4f6f-9377-f1e64837ada6@github.com> <-Fs-Nim4P8TQMnjE9bs2HBY34vQtzhzH2dsU7MDlZrI=.34991658-bed4-46ac-b213-f4988c0f9c8b@github.com> Message-ID: On Fri, 7 Feb 2025 16:36:23 GMT, Emanuel Peter wrote: >>> @dhanalla Would you like this to be reviewed? We generally don't re-review until we get pinged again. The idea is that you are maybe still working on it, and so there is no point in reviewing half-processed code. So once you are happy, you can let us know ;) >> Thanks, @eme64 for checking with me. Yes, it's ready for review. > > @dhanalla Testing failed for this test: > `compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java` > With flags: > - `-server -Xcomp` > - `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > - `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline` > - `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing` > > We also have an internal test that is failing with the same assert: > > `# assert(jobj != nullptr && jobj != phantom_obj) failed: escaped allocation` @eme64, Thanks for reviewing the PR. I have addressed the CR comments, and it is now ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21270#issuecomment-2787578996 From vlivanov at openjdk.org Tue Apr 8 21:41:10 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 21:41:10 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v8] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:15:46 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Looks good. Speaking of bug synopsis, can you make it a bit more concrete and succinct? How about "C2: Verify CastII/CastLL bounds at runtime"? src/hotspot/cpu/x86/x86_64.ad line 431: > 429: source %{ > 430: > 431: bool castLL_is_imm32(const Node* n) { Please, assert that n is CastLL. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22880#pullrequestreview-2751463910 PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2787721137 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2034066958 From vlivanov at openjdk.org Tue Apr 8 21:41:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 21:41:49 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:08:44 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 860: >> >>> 858: movl(c_rarg1, type->_lo); >>> 859: movl(c_rarg2, type->_hi); >>> 860: call(RuntimeAddress(CAST_FROM_FN_PTR(address, abort_checked_cast_int))); >> >> Do you need to align stack pointer before the call? It may be the reason why stack traces aren't printed. > > I believe C2 stacks are 16-byte aligned so we should not need to manually do the alignment? Indeed. Still puzzled why stack traces are omitted. Do you have a reproducer to try? >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 47: >> >>> 45: void fast_unlock_lightweight(Register obj, Register reg_rax, Register t, Register thread); >>> 46: >>> 47: void checked_cast_int(const TypeInt* type, Register val); >> >> There's already some ambiguity around what "checked cast" really means (think of "CastPP" vs "CheckCastPP" vs "checkcast"). Let's not make it worse. I suggest to just name it as `verify_int[_in]_range`/`verify_long[_in]_range`. >> >> Also, unpacking `type` (and pass low/upper bounds as arguments) would make code on receiving end a bit simpler. >> >> And some invariants to assert: >> * no constants (lo == hi); >> * no empty ranges (lo > hi). > > Thanks for the suggestions, I have changed this to `verify_int_in_range`. Expanding the `Type` before passing into the function would be really bad after #17508, though. > Expanding the Type before passing into the function would be really bad after https://github.com/openjdk/jdk/pull/17508, though. Can you elaborate, please? Are you saying that after #17508 there'll be more invariants to verify, so better to pass all the data encapsulated in a `Type`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2034058684 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2034056637 From vlivanov at openjdk.org Tue Apr 8 21:42:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 21:42:07 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:10:17 GMT, Quan Anh Mai wrote: >> FTR it would be nice to include CastII/CastLL node ID to assist diagnosing the bug, but, unfortunately, there's no easy way to capture such information during matching. > > Added printing the `MachNode`'s idx. Thanks. Primarily, I had ideal node id in mind, but mach node info is useful as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2034064513 From eastigeevich at openjdk.org Tue Apr 8 21:56:57 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 8 Apr 2025 21:56:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: <8_hF-HdW2yVkf7L8V-dDQGuV2dcYjyhZSOaPCmhZgRE=.c20b70b3-dde8-4478-a17e-eb2982179372@github.com> On Wed, 26 Mar 2025 13:03:43 GMT, Erik ?sterlund wrote: >> Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: >> >> - Relocate nmethod at safepoint >> - Fix windows build > > I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. > It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head. > 1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up. > 2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up. > 3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values. > 4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it? > 5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed > 6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. > 7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive a safepoint. For ... Hi @fisk, Thank you for the very valuable comment. It has point we have not thought about. > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. >From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache. > You can't just copy oops. Yes, this is the main issue at the moment. Can we do this at a safepoint? > I'm worried about copying the nmethod epoch counters We should clear them. If not, it is a bug. > You don't check if the nmethod is_unloading() when cloning it. Should such nmethods be not entrant? We don't relocate not entrant nmethods. > Have you checked what the JVMCI speculation data Good point to check. > By running the operation in a safepoint you a) introduce an obvious latency problem Yes, we are going to measure it. We don't expect relocation to be a frequent operation. > What are the consequences of copying the deoptimization generation? What do you mean? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2787747282 From duke at openjdk.org Tue Apr 8 22:37:34 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Tue, 8 Apr 2025 22:37:34 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v3] In-Reply-To: References: Message-ID: <8BNSwQ-pIQVmkBcexGw96vI8U-SA_GTnBIjlqBuoJyE=.dbce203d-f283-4b0c-803f-c0d9e9b61c1d@github.com> > Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. > > Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. > > A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: correcting test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24410/files - new: https://git.openjdk.org/jdk/pull/24410/files/3bce2b2e..15bd99d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24410/head:pull/24410 PR: https://git.openjdk.org/jdk/pull/24410 From duke at openjdk.org Tue Apr 8 22:37:35 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Tue, 8 Apr 2025 22:37:35 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 16:39:13 GMT, Emanuel Peter wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> added jtreg test > > test/hotspot/jtreg/compiler/integerArithmetic/TestUnsignedModByZero.java line 50: > >> 48: double x = 1.0; >> 49: return Long.remainderUnsigned(1, (long)(x % x)); >> 50: } > > These tests look identical... Did you mean to use `Integer.remainderUnsigned`? Yes, that is correct. I have fixed this in the latest commit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24410#discussion_r2034122439 From dlong at openjdk.org Tue Apr 8 22:51:35 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Apr 2025 22:51:35 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation [v2] In-Reply-To: References: Message-ID: > This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. > > Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. Dean Long has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/domgraph.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24390/files - new: https://git.openjdk.org/jdk/pull/24390/files/ac25f7ac..2c9c3648 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24390&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24390&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24390/head:pull/24390 PR: https://git.openjdk.org/jdk/pull/24390 From sviswanathan at openjdk.org Tue Apr 8 23:16:32 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 8 Apr 2025 23:16:32 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v11] In-Reply-To: <0oYqgnHHKaYHu_AH2bVR2ZbC45JgK-evjGeFwuN0MSg=.94374b61-e094-499f-95af-a1bfbb70db4d@github.com> References: <0oYqgnHHKaYHu_AH2bVR2ZbC45JgK-evjGeFwuN0MSg=.94374b61-e094-499f-95af-a1bfbb70db4d@github.com> Message-ID: <7x-beqKpFH9EosunQgu3S8L15AeOmIQbPpceBTtGbXI=.86cd56ad-4b4e-45f3-8226-d7a954840c79@github.com> On Fri, 4 Apr 2025 02:10:35 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding missing feature check Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22755#pullrequestreview-2751593568 From dlong at openjdk.org Tue Apr 8 23:27:34 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Apr 2025 23:27:34 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: <0WR4VFMG1KmBQTShtDjr5w6JJosUBS8BIBabWhdH9Bw=.e1ad86d8-f675-47a9-abae-1d706f16ba3a@github.com> References: <0WR4VFMG1KmBQTShtDjr5w6JJosUBS8BIBabWhdH9Bw=.e1ad86d8-f675-47a9-abae-1d706f16ba3a@github.com> Message-ID: On Tue, 8 Apr 2025 09:45:25 GMT, Roberto Casta?eda Lozano wrote: >> This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. >> >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > > The only idea I can think of would be matching and asserting on the output of `-XX:+PrintCFGBlockFreq`, e.g. for the first test in `compiler/loopopts/TestPhaseCFGNeverBranchToGotoMain.java` I get the line > > Loop: 1 trip_count: 1000000 freq: 0 > > before this fix, and the line > > Loop: 1 trip_count: 1000000 freq: 900000 > > after the fix. You could assert that you expect a `freq` greater than 0 for every encountered loop, or something similar. But I guess such a test would be quite fragile. @robcasloz , that's a good suggestion to use -XX:+PrintCFGBlockFreq to check the result, but I agree, writing a test based on it does seem fragile. Given that this code rarely changes (the bug has existed for 15+ years before it was noticed), I would expect the cost of the test (maintenance to prevent false positives) would exceed its value in finding actual regressions in this code. Are reviewers OK with pushing this as-is w/o a regression test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24390#issuecomment-2787856914 From kvn at openjdk.org Wed Apr 9 00:10:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Apr 2025 00:10:35 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24298#pullrequestreview-2751650226 From duke at openjdk.org Wed Apr 9 03:40:32 2025 From: duke at openjdk.org (duke) Date: Wed, 9 Apr 2025 03:40:32 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. @JohnTortugo Your change (at version b121160feafb225c488408dbd4c9a52b2982c3c5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2788198017 From cslucas at openjdk.org Wed Apr 9 05:36:39 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 9 Apr 2025 05:36:39 GMT Subject: Integrated: 8352681: C2 compilation hits asserts "must set the initial type just once" In-Reply-To: References: Message-ID: On Sun, 6 Apr 2025 04:04:56 GMT, Cesar Soares Lucas wrote: > The reason for the error reported is that when RAM tries to reduce a field load through a Phi it ends up calling `step_through_mergemem` with `_delay_transform` set to true and, since `step_through_mergemem` assumes that `_delay_transform` is `false`, it calls `igvn->transform` passing a MergeMem that has been added to the graph long ago. > > I didn't opt to make `_delay_transform` false during the RAM transformations because that seemed to be a risky move, nevertheless I'll create an RFE and keep investigating that option. > > Also, while working on a test case I found that C2 doesn't remove (at least not before EA steps) the `if (param != param) {...}` and I'm going to investigate that as a separate RFE. > > I tested this with JTREG Tier 1-3 on Linux x86_64. This pull request has now been integrated. Changeset: b045e3fb Author: Cesar Soares Lucas Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/b045e3fbd7920465b5b67d43e35db98b935241d5 Stats: 73 lines in 2 files changed: 69 ins; 0 del; 4 mod 8352681: C2 compilation hits asserts "must set the initial type just once" Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/24471 From chagedorn at openjdk.org Wed Apr 9 05:39:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 05:39:25 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v3] In-Reply-To: References: Message-ID: <8WC2kiOZ23hjYFNwRj88JRFtjX6dxfJdBhr0gKUNPkw=.1d9160e4-6e26-47db-8153-7df1b3f7e28e@github.com> On Tue, 8 Apr 2025 14:10:42 GMT, Manuel H?ssig wrote: >> This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. >> >> I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix capitalization Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24509#pullrequestreview-2752094637 From chagedorn at openjdk.org Wed Apr 9 05:44:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 05:44:45 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 05:52:44 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > Modify jtreg test test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 28: > 26: * @bug 8315916 > 27: * @summary Test early bailout during the creation of graph nodes for the scalarization of array fields, rather than during code generation. > 28: * @run main/othervm/timeout=240000 That's a huge timeout. How long does this test need on your machine? If too long, can you also trigger the issue with a smaller `EliminateAllocationArraySizeLimit` than currently used? test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 29: > 27: * @summary Test early bailout during the creation of graph nodes for the scalarization of array fields, rather than during code generation. > 28: * @run main/othervm/timeout=240000 > 29: * -Xcomp Can you limit the performed compilations to a single or few methods with `compileonly` to trigger the assert? This will reduce the required time to run this test with `Xcomp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2034482721 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2034484564 From chagedorn at openjdk.org Wed Apr 9 05:49:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 05:49:32 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v3] In-Reply-To: <8BNSwQ-pIQVmkBcexGw96vI8U-SA_GTnBIjlqBuoJyE=.dbce203d-f283-4b0c-803f-c0d9e9b61c1d@github.com> References: <8BNSwQ-pIQVmkBcexGw96vI8U-SA_GTnBIjlqBuoJyE=.dbce203d-f283-4b0c-803f-c0d9e9b61c1d@github.com> Message-ID: On Tue, 8 Apr 2025 22:37:34 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > correcting test Looks good! src/hotspot/share/opto/divnode.cpp line 1322: > 1320: } > 1321: > 1322: // Mod by zero? Throw exception at runtime! Suggestion: // Mod by zero? Throw an exception at runtime! test/hotspot/jtreg/compiler/integerArithmetic/TestUnsignedModByZero.java line 53: > 51: > 52: public static void main(String[] args) { > 53: for (int i =0; i < 10_000; i++) { Suggestion: for (int i = 0; i < 10_000; i++) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24410#pullrequestreview-2752111076 PR Review Comment: https://git.openjdk.org/jdk/pull/24410#discussion_r2034489590 PR Review Comment: https://git.openjdk.org/jdk/pull/24410#discussion_r2034489938 From chagedorn at openjdk.org Wed Apr 9 06:14:44 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 06:14:44 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 12:34:01 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8349479 > - Update src/hotspot/share/opto/node.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java > > Co-authored-by: Christian Hagedorn > - review > - review > - review > - Merge branch 'master' into JDK-8349479 > - Update src/hotspot/share/opto/convertnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/convertnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 7 more: https://git.openjdk.org/jdk/compare/8912f806...d9f53010 Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2788380903 From chagedorn at openjdk.org Wed Apr 9 06:24:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 06:24:28 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v10] In-Reply-To: References: Message-ID: <80OrtF80vVrZTZMok3N7t6hbPpJAISqxAbbTd5bJvxE=.d54cc0a1-f78b-4597-b9dc-301f53ba86cf@github.com> On Tue, 8 Apr 2025 15:23:06 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn Testing looked good, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23871#pullrequestreview-2752176468 From thartmann at openjdk.org Wed Apr 9 06:25:34 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Apr 2025 06:25:34 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 22:51:35 GMT, Dean Long wrote: >> This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. >> >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/domgraph.cpp > > Co-authored-by: Roberto Casta?eda Lozano > Are reviewers OK with pushing this as-is w/o a regression test? Fine with me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24390#pullrequestreview-2752181119 From chagedorn at openjdk.org Wed Apr 9 07:05:42 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 07:05:42 GMT Subject: RFR: 8353841: [jittester] Fix JITTester build after asm removal [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 06:38:26 GMT, Evgeny Nikitin wrote: >> [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. >> >> This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. >> Testing: Local runs of targets `COMPILE` and `all`, no errors found. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Simplify compile_testlib as well > > Co-authored-by: Chen Liang Looks reasonable to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24487#pullrequestreview-2752308837 From enikitin at openjdk.org Wed Apr 9 07:05:42 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 9 Apr 2025 07:05:42 GMT Subject: Integrated: 8353841: [jittester] Fix JITTester build after asm removal In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 11:24:13 GMT, Evgeny Nikitin wrote: > [JDK-8346981](https://bugs.openjdk.org/browse/JDK-8346981) removed jdk.internal.objectweb.asm packages from java.base, causing JITTester build to fail. > > This PR fixes the build by building ASM prior to the testlibrary and JITTester builds. > Testing: Local runs of targets `COMPILE` and `all`, no errors found. This pull request has now been integrated. Changeset: 0f70aae1 Author: Evgeny Nikitin Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/0f70aae1cc4fd48ef2de3b0fe4741a32660ed4f9 Stats: 10 lines in 1 file changed: 6 ins; 0 del; 4 mod 8353841: [jittester] Fix JITTester build after asm removal Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24487 From mchevalier at openjdk.org Wed Apr 9 07:20:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 9 Apr 2025 07:20:00 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v3] In-Reply-To: References: Message-ID: > If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. > > In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. > > This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. > > The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. > > Tested with tier1..3, hs-precheckin-comp and hs-comp-stress > > Thanks, > Marc Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into feat/Fold-layout-helper-check-for-objects-implementing-non-array-interfaces - Merge branch 'master' into feat/Fold-layout-helper-check-for-objects-implementing-non-array-interfaces - not reinventing the wheel - Revert now useless fix - Generalize the not-array proof ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24245/files - new: https://git.openjdk.org/jdk/pull/24245/files/daaaf9ae..b1fb82a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24245&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24245&range=01-02 Stats: 74611 lines in 2220 files changed: 27997 ins; 41827 del; 4787 mod Patch: https://git.openjdk.org/jdk/pull/24245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24245/head:pull/24245 PR: https://git.openjdk.org/jdk/pull/24245 From rcastanedalo at openjdk.org Wed Apr 9 07:31:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Apr 2025 07:31:40 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 22:51:35 GMT, Dean Long wrote: >> This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. >> >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/domgraph.cpp > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24390#pullrequestreview-2752391764 From shade at openjdk.org Wed Apr 9 07:31:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 07:31:45 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 17:12:54 GMT, Emanuel Peter wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Cleanup ADLC as well >> - Revert some accidental removals >> - Merge branch 'master' into JDK-8353192-x86-c2-backend >> - Touchup >> - Fix > > Looks reasonable :) > > I launched some internal testing just in case, please ping me again in 24h :) @eme64 -- just checking ahead of 24 hours, maybe the testing is complete? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24300#issuecomment-2788626644 From rcastanedalo at openjdk.org Wed Apr 9 07:35:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Apr 2025 07:35:31 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: <0WR4VFMG1KmBQTShtDjr5w6JJosUBS8BIBabWhdH9Bw=.e1ad86d8-f675-47a9-abae-1d706f16ba3a@github.com> References: <0WR4VFMG1KmBQTShtDjr5w6JJosUBS8BIBabWhdH9Bw=.e1ad86d8-f675-47a9-abae-1d706f16ba3a@github.com> Message-ID: On Tue, 8 Apr 2025 09:45:25 GMT, Roberto Casta?eda Lozano wrote: >> This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. >> >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > > The only idea I can think of would be matching and asserting on the output of `-XX:+PrintCFGBlockFreq`, e.g. for the first test in `compiler/loopopts/TestPhaseCFGNeverBranchToGotoMain.java` I get the line > > Loop: 1 trip_count: 1000000 freq: 0 > > before this fix, and the line > > Loop: 1 trip_count: 1000000 freq: 900000 > > after the fix. You could assert that you expect a `freq` greater than 0 for every encountered loop, or something similar. But I guess such a test would be quite fragile. > @robcasloz , that's a good suggestion to use -XX:+PrintCFGBlockFreq to check the result, but I agree, writing a test based on it does seem fragile. Given that this code rarely changes (the bug has existed for 15+ years before it was noticed), I would expect the cost of the test (maintenance to prevent false positives) would exceed its value in finding actual regressions in this code. Are reviewers OK with pushing this as-is w/o a regression test? Sure, I agree with your cost/benefit analysis. If we ever find more bugs in the future due to unexpected successor order for `NeverBranch` or other node classes, we should consider enforcing a canonical successor order during or right after control-flow graph construction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24390#issuecomment-2788637843 From shade at openjdk.org Wed Apr 9 08:04:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 08:04:46 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Huh, I thought you are a Committer now. I'll check with Registrar. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2788700993 From cslucas at openjdk.org Wed Apr 9 08:04:47 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 9 Apr 2025 08:04:47 GMT Subject: Integrated: 8334046: Set different values for CompLevel_any and CompLevel_all In-Reply-To: References: Message-ID: <_mS5x7C6isihi7VgdnT-_PS_slnyeF_naz6TirOCti8=.5f172c1e-3693-46f0-b793-fd09c66c27c1@github.com> On Fri, 28 Mar 2025 16:12:47 GMT, Cesar Soares Lucas wrote: > Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. > Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. > Tested on OSX/Linux Aarch64/x86_64 with JTREG. This pull request has now been integrated. Changeset: 9ee55903 Author: Cesar Soares Lucas Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/9ee5590328e7d5f5070efdbd7ffc44cb660005cc Stats: 8 lines in 3 files changed: 2 ins; 0 del; 6 mod 8334046: Set different values for CompLevel_any and CompLevel_all Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24298 From epeter at openjdk.org Wed Apr 9 08:08:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 9 Apr 2025 08:08:39 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:47:19 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Cleanup ADLC as well > - Revert some accidental removals > - Merge branch 'master' into JDK-8353192-x86-c2-backend > - Touchup > - Fix Testing looks :green_circle: Thanks for the cleanup :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24300#pullrequestreview-2752494862 From duke at openjdk.org Wed Apr 9 08:18:01 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 9 Apr 2025 08:18:01 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v4] In-Reply-To: References: Message-ID: > Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. > > Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. > > A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: implementing reviewer comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24410/files - new: https://git.openjdk.org/jdk/pull/24410/files/15bd99d9..667c7e7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24410/head:pull/24410 PR: https://git.openjdk.org/jdk/pull/24410 From dfenacci at openjdk.org Wed Apr 9 08:20:27 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 9 Apr 2025 08:20:27 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v4] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:18:01 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > implementing reviewer comment Looks good to me otherwise. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24410#pullrequestreview-2752483798 From epeter at openjdk.org Wed Apr 9 08:20:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 9 Apr 2025 08:20:27 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v4] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:18:01 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > implementing reviewer comment Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24410#pullrequestreview-2752528562 From dfenacci at openjdk.org Wed Apr 9 08:20:28 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 9 Apr 2025 08:20:28 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v3] In-Reply-To: <8BNSwQ-pIQVmkBcexGw96vI8U-SA_GTnBIjlqBuoJyE=.dbce203d-f283-4b0c-803f-c0d9e9b61c1d@github.com> References: <8BNSwQ-pIQVmkBcexGw96vI8U-SA_GTnBIjlqBuoJyE=.dbce203d-f283-4b0c-803f-c0d9e9b61c1d@github.com> Message-ID: On Tue, 8 Apr 2025 22:37:34 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > correcting test test/hotspot/jtreg/compiler/integerArithmetic/TestUnsignedModByZero.java line 27: > 25: * @test > 26: * @bug 8351660 > 27: * @summary Test that modulo by zero throws exception at runtime in case of unsigned values. Suggestion: * @summary Test that modulo by zero throws an exception at runtime in case of unsigned values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24410#discussion_r2034733121 From shade at openjdk.org Wed Apr 9 08:25:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 08:25:42 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:47:19 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Cleanup ADLC as well > - Revert some accidental removals > - Merge branch 'master' into JDK-8353192-x86-c2-backend > - Touchup > - Fix Great! I am integrating then. I merged with current mainline locally, and there were no troubles. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24300#issuecomment-2788756733 From shade at openjdk.org Wed Apr 9 08:25:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 08:25:42 GMT Subject: Integrated: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 17:09:19 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` This pull request has now been integrated. Changeset: 250eb743 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/250eb743c112fbcc45bf2b3ded1c644b19893577 Stats: 493 lines in 6 files changed: 1 ins; 420 del; 72 mod 8353192: C2: Clean up x86 backend after 32-bit x86 removal Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24300 From duke at openjdk.org Wed Apr 9 08:54:57 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 9 Apr 2025 08:54:57 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v5] In-Reply-To: References: Message-ID: > Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. > > Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. > > A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: correcting comments in included test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24410/files - new: https://git.openjdk.org/jdk/pull/24410/files/667c7e7a..21623249 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24410/head:pull/24410 PR: https://git.openjdk.org/jdk/pull/24410 From chagedorn at openjdk.org Wed Apr 9 08:54:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 08:54:57 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v5] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:50:08 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > correcting comments in included test Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24410#pullrequestreview-2752619842 From dfenacci at openjdk.org Wed Apr 9 08:54:57 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 9 Apr 2025 08:54:57 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v5] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:50:08 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > correcting comments in included test Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24410#pullrequestreview-2752623157 From amitkumar at openjdk.org Wed Apr 9 08:57:40 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 9 Apr 2025 08:57:40 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: - reviews for Martin - Revert "minor improvement" This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. - minor improvement - reviews from Lutz and Martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24480/files - new: https://git.openjdk.org/jdk/pull/24480/files/e7cf3a83..1b8ea8bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=00-01 Stats: 21 lines in 2 files changed: 0 ins; 7 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480 PR: https://git.openjdk.org/jdk/pull/24480 From mdoerr at openjdk.org Wed Apr 9 08:57:41 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 08:57:41 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:53:23 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: > > - reviews for Martin > - Revert "minor improvement" > > This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. > - minor improvement > - reviews from Lutz and Martin src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1482: > 1480: > 1481: if (elem_size > 1) { > 1482: __ rotate_then_insert(byteVal, byteVal, 64 - 2 * 8 , 63 - 8, 8, 0); The last argument seems to be a `bool`. The value should better be `false`. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1542: > 1540: __ z_nill(rScratch1, 7); > 1541: __ z_braz(L_fill8Bytes); // branch if 0 > 1542: Extra newline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2033102883 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2033103489 From amitkumar at openjdk.org Wed Apr 9 08:57:41 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 9 Apr 2025 08:57:41 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 12:39:24 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: >> >> - reviews for Martin >> - Revert "minor improvement" >> >> This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. >> - minor improvement >> - reviews from Lutz and Martin > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1542: > >> 1540: __ z_nill(rScratch1, 7); >> 1541: __ z_braz(L_fill8Bytes); // branch if 0 >> 1542: > > Extra newline. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2034825612 From jbhateja at openjdk.org Wed Apr 9 08:59:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Apr 2025 08:59:08 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v12] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Minor tuning in selection pattern ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/2c09e816..14bfe9b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=10-11 Stats: 7 lines in 2 files changed: 0 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From jbhateja at openjdk.org Wed Apr 9 08:59:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Apr 2025 08:59:09 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: <2pz-rUzSGEmewJniiogeaEdy5NXgUQJr0GUVz-sTK80=.5edf4309-3ca5-497b-842d-374c3f06358f@github.com> On Fri, 28 Mar 2025 11:28:44 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Some re-factoring > > Changes requested by epeter (Reviewer). Hi @eme64 , Please let us know if there are further comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22755#issuecomment-2788847753 From epeter at openjdk.org Wed Apr 9 09:10:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 9 Apr 2025 09:10:33 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: <2pz-rUzSGEmewJniiogeaEdy5NXgUQJr0GUVz-sTK80=.5edf4309-3ca5-497b-842d-374c3f06358f@github.com> References: <2pz-rUzSGEmewJniiogeaEdy5NXgUQJr0GUVz-sTK80=.5edf4309-3ca5-497b-842d-374c3f06358f@github.com> Message-ID: <5IbBJHn_SugdrGTRoClg7P1GNHpPYmBgTou5Z9rTeeE=.c3f8a6a1-21e0-4e93-ab02-8783f8fe95e2@github.com> On Wed, 9 Apr 2025 08:54:06 GMT, Jatin Bhateja wrote: >> Changes requested by epeter (Reviewer). > > Hi @eme64 , Please let us know if there are further comments. @jatin-bhateja I think it looks good now, but let me run some more tests :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22755#issuecomment-2788885047 From duke at openjdk.org Wed Apr 9 09:19:40 2025 From: duke at openjdk.org (duke) Date: Wed, 9 Apr 2025 09:19:40 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 07:20:00 GMT, Marc Chevalier wrote: >> If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. >> >> In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. >> >> This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. >> >> The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. >> >> Tested with tier1..3, hs-precheckin-comp and hs-comp-stress >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into feat/Fold-layout-helper-check-for-objects-implementing-non-array-interfaces > - Merge branch 'master' into feat/Fold-layout-helper-check-for-objects-implementing-non-array-interfaces > - not reinventing the wheel > - Revert now useless fix > - Generalize the not-array proof @marc-chevalier Your change (at version b1fb82a28a4f6c3f126d312727cb9f89a9f51669) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24245#issuecomment-2788951877 From mchevalier at openjdk.org Wed Apr 9 09:19:39 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 9 Apr 2025 09:19:39 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v3] In-Reply-To: References: Message-ID: <1kzdW30AxgS9RnVEzxrwkkQk8dwxOT79wloukL-Vz38=.8c62c7ef-10b0-4011-8744-7ae24211362c@github.com> On Wed, 9 Apr 2025 07:20:00 GMT, Marc Chevalier wrote: >> If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. >> >> In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. >> >> This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. >> >> The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. >> >> Tested with tier1..3, hs-precheckin-comp and hs-comp-stress >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into feat/Fold-layout-helper-check-for-objects-implementing-non-array-interfaces > - Merge branch 'master' into feat/Fold-layout-helper-check-for-objects-implementing-non-array-interfaces > - not reinventing the wheel > - Revert now useless fix > - Generalize the not-array proof The branch was a bit old, so I've merged master in it and run tests. It seems all good! Thanks @TobiHartmann and @rwestrel for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24245#issuecomment-2788948220 From mchevalier at openjdk.org Wed Apr 9 09:31:49 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 9 Apr 2025 09:31:49 GMT Subject: Integrated: 8348853: Fold layout helper check for objects implementing non-array interfaces In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:16:17 GMT, Marc Chevalier wrote: > If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. > > In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. > > This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. > > The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. > > Tested with tier1..3, hs-precheckin-comp and hs-comp-stress > > Thanks, > Marc This pull request has now been integrated. Changeset: a1d566ce Author: Marc Chevalier Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/a1d566ce4b0315591ece489347c5d1c253f06be9 Stats: 34 lines in 5 files changed: 23 ins; 7 del; 4 mod 8348853: Fold layout helper check for objects implementing non-array interfaces Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.org/jdk/pull/24245 From hgreule at openjdk.org Wed Apr 9 09:37:36 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 9 Apr 2025 09:37:36 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Tue, 8 Apr 2025 19:56:14 GMT, Vladimir Ivanov wrote: >> @j3graham I mainly want to avoid conflicts or duplicated solutions. I meant that after #17508 this code can be further generalized anyway, allowing to use the `try_cast` function then. This can obviously happen in a separate PR, if this one is integrated before. >> >> @iwanowww I'm not sure if that's possible without more duplication. We need to choose the correct byteswap implementation depending on the node's type. Please let me know if I'm missing something. > > It does have some duplication, but simply in a different place. I wouldn't say there's more of it: > > static const Type* reverse_bytes(int op, const Type* con) { > switch (op) { > case Op_ReverseBytesS: return TypeInt::make(byteswap(checked_cast(con->is_int()->get_con()))); > case Op_ReverseBytesUS: return TypeInt::make(byteswap(checked_cast (con->is_int()->get_con()))); > case Op_ReverseBytesI: return TypeInt::make(byteswap(con->is_int()->get_con())); > case Op_ReverseBytesL: return TypeLong::make(byteswap(con->is_long()->get_con())); > > default: ShouldNotReachHere(); > } > } > > const Type* ReverseBytesNode::Value(PhaseGVN* phase) const { > const Type* type = phase->type(in(1)); > if (type == Type::TOP) { > return Type::TOP; > } > if (type->singleton()) { > return reverse_bytes(Opcode(), type); > } > return bottom_type(); > } > > > At some point, we should consider folding `ReverseBytes*Node` specializations into a single one (`ReverseBytesNode`) parameterized by element type (as was done for `ReverseBytesV` and other vector nodes). After it is done, there won't be a convenient way to piggyback on virtual calls to hook specialized versions forcing us to do explicit dispatch anyway. Okay, I see. That works for me. One problem though, there currently isn't a common `ReverseBytesNode` type, so I'd need to add that. And I assume that should be a `TypeNode` then? In that case, as `ReverseBytes*Node`s are `InvolutionNode`s, would `InvolutionNode` need to be a `TypeNode`? Or can I use multiple inheritance? (I didn't see any example of that in current Node types). Both are doable, and the first might even make sense, but I'm not sure if it's a bit much for this PR. Or do you want me to (temporarily) duplicate the Value code (i.e. move more code from the templated function to the Value functions, and only keep the simple `reverse_bytes` from your snippet)? Alternatively, I'd also be fine to put this PR on hold and make InvolutionNode a TypeNode first, or whatever you think is best. Please let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2034940786 From yzheng at openjdk.org Wed Apr 9 10:16:46 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 10:16:46 GMT Subject: RFR: 8354130: Assert failure at CompilationPolicy::can_be_compiled after JDK-8334046 In-Reply-To: References: Message-ID: <5lYw07Fu-F_K_EzpyPk5wVY0OergV20-fa5bzDPbrT0=.68c1ac87-6e39-490a-af3e-8440a180f7d4@github.com> On Wed, 9 Apr 2025 10:11:36 GMT, Yudi Zheng wrote: > With https://github.com/openjdk/jdk/pull/24298 , `CompLevel_any` and `CompLevel_all` are now different, causing an assertion failure in CompilationPolicy::can_be_compiled. This PR includes `CompLevel_all` in the assertion condition. src/hotspot/share/compiler/compilationPolicy.cpp line 125: > 123: bool CompilationPolicy::can_be_compiled(const methodHandle& m, int comp_level) { > 124: // allow any levels for WhiteBox > 125: assert(WhiteBoxAPI || comp_level == CompLevel_any || comp_level == CompLevel_all || is_compile(comp_level), "illegal compilation level"); @JohnTortugo could you please review this? I assume you have audited all other places where we use `CompLevel_any` and `CompLevel_all `. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24540#discussion_r2035042211 From yzheng at openjdk.org Wed Apr 9 10:16:46 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 10:16:46 GMT Subject: RFR: 8354130: Assert failure at CompilationPolicy::can_be_compiled after JDK-8334046 Message-ID: With https://github.com/openjdk/jdk/pull/24298 , `CompLevel_any` and `CompLevel_all` are now different, causing an assertion failure in CompilationPolicy::can_be_compiled. This PR includes `CompLevel_all` in the assertion condition. ------------- Commit messages: - Assert failure at CompilationPolicy::can_be_compiled after JDK-8334046 Changes: https://git.openjdk.org/jdk/pull/24540/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24540&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354130 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24540/head:pull/24540 PR: https://git.openjdk.org/jdk/pull/24540 From dlunden at openjdk.org Wed Apr 9 11:34:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 9 Apr 2025 11:34:41 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:18:46 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework This is a rather intrusive changeset that may also affect ports not supported by Oracle. @offamitkumar @TheRealMDoerr @snazarkin @bulasevich @RealFYang: Can you please test these changes on your respective ports? In particular, please make sure to run the tests `compiler/arguments/TestMaxMethodArguments.java` and `compiler/locks/TestNestedSynchronize.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2789375929 From aph at openjdk.org Wed Apr 9 11:52:31 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 9 Apr 2025 11:52:31 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: <2YNBZ_h9fT8Gvdc3SI6sfKMVd91ASQv-uwNKty4L03M=.2e4d82b0-5c3a-4cf2-8e38-fe089f945cea@github.com> On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. I see that the hashcode acceleration is a recent addition, so this doesn't need backports. Whatever we do here is going to be ugly, as far as I can see. I suppose the interesting question is how long we intend to support Cortex A53 for. When you build HotSpot for Cortex A53, do you use the `-mfix-cortex-a53-835769` GCC option, or is that option always on, by default, on the GCC that you use? I have no objection to this patch, but I wonder if this Cortex A53 bug might also fall foul of GCC-generated code on standard HotSpot builds. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2789427121 From thartmann at openjdk.org Wed Apr 9 12:26:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Apr 2025 12:26:46 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. > Please review this trivial patch This patch is definitely not trivial and caused a regression, see [JDK-8354130](https://bugs.openjdk.org/browse/JDK-8354130). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2789525759 From thartmann at openjdk.org Wed Apr 9 12:29:31 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Apr 2025 12:29:31 GMT Subject: RFR: 8354130: Assert failure at CompilationPolicy::can_be_compiled after JDK-8334046 In-Reply-To: <5lYw07Fu-F_K_EzpyPk5wVY0OergV20-fa5bzDPbrT0=.68c1ac87-6e39-490a-af3e-8440a180f7d4@github.com> References: <5lYw07Fu-F_K_EzpyPk5wVY0OergV20-fa5bzDPbrT0=.68c1ac87-6e39-490a-af3e-8440a180f7d4@github.com> Message-ID: On Wed, 9 Apr 2025 10:13:20 GMT, Yudi Zheng wrote: >> With https://github.com/openjdk/jdk/pull/24298 , `CompLevel_any` and `CompLevel_all` are now different, causing an assertion failure in CompilationPolicy::can_be_compiled. This PR includes `CompLevel_all` in the assertion condition. > > src/hotspot/share/compiler/compilationPolicy.cpp line 125: > >> 123: bool CompilationPolicy::can_be_compiled(const methodHandle& m, int comp_level) { >> 124: // allow any levels for WhiteBox >> 125: assert(WhiteBoxAPI || comp_level == CompLevel_any || comp_level == CompLevel_all || is_compile(comp_level), "illegal compilation level"); > > @JohnTortugo could you please review this? I assume you have audited all other places where we use `CompLevel_any` and `CompLevel_all `. Looking at other usages of `CompLevel_any`/`CompLevel_all`, I'm concerned that JDK-8334046 accidentally introduced some semantic changes. I think we should back it out and redo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24540#discussion_r2035260067 From yzheng at openjdk.org Wed Apr 9 12:54:49 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 12:54:49 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. I am auditing `CompLevel_any` and `CompLevel_all` usages and am uncertain about if we have more mixed up usages in the code base than [JDK-8354130](https://bugs.openjdk.org/browse/JDK-8354130). I think we should backout this and run higher tiers in a redo. Backout PR at https://github.com/openjdk/jdk/pull/24540 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2789600255 From yzheng at openjdk.org Wed Apr 9 12:55:36 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 12:55:36 GMT Subject: RFR: 8354181: [Backout] 8334046: Set different values for CompLevel_any and CompLevel_all Message-ID: https://github.com/openjdk/jdk/commit/9ee5590328e7d5f5070efdbd7ffc44cb660005cc fails in tier1 TestEnableJVMCIProduct. Clean backout. ------------- Commit messages: - Revert "8334046: Set different values for CompLevel_any and CompLevel_all" Changes: https://git.openjdk.org/jdk/pull/24544/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24544&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354181 Stats: 8 lines in 3 files changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24544/head:pull/24544 PR: https://git.openjdk.org/jdk/pull/24544 From thartmann at openjdk.org Wed Apr 9 12:55:37 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Apr 2025 12:55:37 GMT Subject: RFR: 8354181: [Backout] 8334046: Set different values for CompLevel_any and CompLevel_all In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 12:45:47 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/commit/9ee5590328e7d5f5070efdbd7ffc44cb660005cc fails in tier1 TestEnableJVMCIProduct. Clean backout. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24544#pullrequestreview-2753361178 From avoitylov at openjdk.org Wed Apr 9 13:02:32 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Wed, 9 Apr 2025 13:02:32 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. It will have to be backported to 21u, as the intrinsic is already in 21u-dev. I'll do that once we address 8353237 in jdk/jdk. Our cross compiler has, among other erratum flags, `-mfix-cortex-a53-835769` on by default. IIRC Linaro also does this, but hard to tell if all vendors building OpenJDK do that. Note that JDK-8079203 does not check the specific erratum flags in a processor. Instead, it adds nop based on processor type, penalizing all A53s. I could not find the history of that decision, maybe because not all A53 implementors used the erratum flags, or for the sake of simplicity. It's also consistent with a typical GCC setting. I decided not to touch this part because of that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2789622909 From chagedorn at openjdk.org Wed Apr 9 13:05:40 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Apr 2025 13:05:40 GMT Subject: RFR: 8354181: [Backout] 8334046: Set different values for CompLevel_any and CompLevel_all In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 12:45:47 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/commit/9ee5590328e7d5f5070efdbd7ffc44cb660005cc fails in tier1 TestEnableJVMCIProduct. Clean backout. Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24544#pullrequestreview-2753401386 From yzheng at openjdk.org Wed Apr 9 13:06:40 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 13:06:40 GMT Subject: Withdrawn: 8354130: Assert failure at CompilationPolicy::can_be_compiled after JDK-8334046 In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 10:11:36 GMT, Yudi Zheng wrote: > With https://github.com/openjdk/jdk/pull/24298 , `CompLevel_any` and `CompLevel_all` are now different, causing an assertion failure in CompilationPolicy::can_be_compiled. This PR includes `CompLevel_all` in the assertion condition. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24540 From yzheng at openjdk.org Wed Apr 9 13:11:43 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 13:11:43 GMT Subject: RFR: 8354181: [Backout] 8334046: Set different values for CompLevel_any and CompLevel_all In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 12:45:47 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/commit/9ee5590328e7d5f5070efdbd7ffc44cb660005cc fails in tier1 TestEnableJVMCIProduct. Clean backout. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24544#issuecomment-2789644551 From yzheng at openjdk.org Wed Apr 9 13:11:44 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 9 Apr 2025 13:11:44 GMT Subject: Integrated: 8354181: [Backout] 8334046: Set different values for CompLevel_any and CompLevel_all In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 12:45:47 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/commit/9ee5590328e7d5f5070efdbd7ffc44cb660005cc fails in tier1 TestEnableJVMCIProduct. Clean backout. This pull request has now been integrated. Changeset: 9d8b93b6 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/9d8b93b6e2fa7a6c81d96f82ae8f5de222027879 Stats: 8 lines in 3 files changed: 0 ins; 2 del; 6 mod 8354181: [Backout] 8334046: Set different values for CompLevel_any and CompLevel_all Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24544 From aph at openjdk.org Wed Apr 9 13:21:38 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 9 Apr 2025 13:21:38 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: <5gLgTwqtwzM37EigG0q9SbWSSb8LlA9ok9_X4tdnFVw=.b6ecc041-cdea-449d-887d-c05db5b46257@github.com> On Wed, 9 Apr 2025 12:59:24 GMT, Aleksei Voitylov wrote: > Our cross compiler has, among other erratum flags, `-mfix-cortex-a53-835769` on by default. IIRC Linaro also does this, but hard to tell if all vendors building OpenJDK do that. Hmm. If we're going to fix this problem properly I guess we should add that flag to the build config. Maybe another PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2789677807 From avoitylov at openjdk.org Wed Apr 9 13:28:26 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Wed, 9 Apr 2025 13:28:26 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. Maybe, but I'm not sure this is the right thing to enforce in the build system across the board either. A cloud vendor with a big Arm fleet without A53s may find such a setting undesirable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2789725710 From rcastanedalo at openjdk.org Wed Apr 9 13:46:44 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Apr 2025 13:46:44 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:18:46 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework Marked as reviewed by rcastanedalo (Reviewer). test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 1: > 1: /* Great idea to generate these tests! Just a nit: the previous version contained 100 instances (from `test1` to `test100`), but this version generates up to `test99` only. ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2753544334 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2035409926 From epeter at openjdk.org Wed Apr 9 14:40:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 9 Apr 2025 14:40:52 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: <_GLaoVcaiLFvpL4yziOtrgd5QscbH-zrEJ2w3c5iMOY=.994428df-9217-4497-b9e1-5a8b8a24ec8f@github.com> On Mon, 7 Apr 2025 15:18:46 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 106: > 104: %s > 105: }""", test_class_name, inner); > 106: } Probably this is not so bad with only 100 repetitions ... but this leads to a lot of successive strings generated, resulting in larger and larger strings allocated, essentially requiring quadratic memory. A StringBuilder would help here. Maybe this is just an FYI: the TemplateFramework internally uses a StringBuilder to avoid this successive concatenation / allocation issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2035523940 From ihse at openjdk.org Wed Apr 9 15:09:58 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 15:09:58 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> On Thu, 11 May 2023 20:21:57 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Convert the merged master changes to UTF-8 > - Merge master and fix conflicts > - Close streams when finished loading into props > - Adjust CF test to read in with UTF-8 to fix failing test > - Reconvert CS.properties to UTF-8 > - Revert all changes to CurrencySymbols.properties > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties > - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties line 22: > 20: # Peter Smolik > 21: Cp1250 WINDOWS-1250 0x00FF > 22: # Patch attributed to havardw at underdusken.no (H?vard Wigtil) This does not seem to have been a correct conversion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2035582242 From mdoerr at openjdk.org Wed Apr 9 15:15:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 15:15:39 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:18:46 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework The included tests have passed on PPC64. Thanks for the ping! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2790071311 From aph at openjdk.org Wed Apr 9 15:49:51 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 9 Apr 2025 15:49:51 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 13:26:06 GMT, Aleksei Voitylov wrote: > Maybe, but I'm not sure this is the right thing to enforce in the build system across the board either. For example, a cloud vendor with a big Arm fleet without A53s may find such a setting undesirable. I don't think there is any possibility that a big cloud vendor would notice. From what I see of GCC-generated code, added NOPs are very rare, because GCC tends to schedule loads long before uses. It's likely that there would be no difference to any GCC-generated code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2790170370 From kxu at openjdk.org Wed Apr 9 15:52:55 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Apr 2025 15:52:55 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 20:33:47 GMT, Kangcheng Xu wrote: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple to detection and conversion code. This enables us to try different loop configurations easy and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). C2 changes ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-2783724710 From kxu at openjdk.org Wed Apr 9 15:52:55 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Apr 2025 15:52:55 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() Message-ID: This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple to detection and conversion code. This enables us to try different loop configurations easy and finally convert once a counted loop is found. A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). ------------- Commit messages: - line break - remove TODOs - Revert "improve formatting, naming, comments" - improve formatting, naming, comments - Merge branch 'master' into counted-loop-refactor - improve formatting, naming, comments - rearrange code for a more sensible diff - clean up code - extract stress_long_counted_loop() - remove debug lines - ... and 5 more: https://git.openjdk.org/jdk/compare/9a391f44...3db745d2 Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353290 Stats: 544 lines in 3 files changed: 179 ins; 86 del; 279 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From cslucas at openjdk.org Wed Apr 9 16:32:36 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 9 Apr 2025 16:32:36 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Sorry about the mess and thank you for reverting. I'm not sure why I didn't catch the failure in my tests, I'll look into that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2790327705 From snazarki at openjdk.org Wed Apr 9 17:17:44 2025 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Wed, 9 Apr 2025 17:17:44 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:18:46 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework The recommended tests have passed on ARM32. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2790429096 From vlivanov at openjdk.org Wed Apr 9 17:58:43 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Apr 2025 17:58:43 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Wed, 9 Apr 2025 09:34:28 GMT, Hannes Greule wrote: > And I assume that should be a TypeNode then? Why do you think it benefits from becoming a `TypeNode`? > In that case, as ReverseBytes*Nodes are InvolutionNodes, would InvolutionNode need to be a TypeNode? Or can I use multiple inheritance? We try to avoid multiple inheritance in JVM and C2 doesn't use any AFAIK. Actually, I had an afterthought about `InvolutionNode` after approving it. It looks a bit weird to model "involution" property through inheritance. (Primarily, because it's hard to mix multiple properties.) Node flags would be a better fit IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2035858736 From vlivanov at openjdk.org Wed Apr 9 17:58:44 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Apr 2025 17:58:44 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Wed, 9 Apr 2025 17:53:48 GMT, Vladimir Ivanov wrote: >> Okay, I see. That works for me. One problem though, there currently isn't a common `ReverseBytesNode` type, so I'd need to add that. And I assume that should be a `TypeNode` then? In that case, as `ReverseBytes*Node`s are `InvolutionNode`s, would `InvolutionNode` need to be a `TypeNode`? Or can I use multiple inheritance? (I didn't see any example of that in current Node types). Both are doable, and the first might even make sense, but I'm not sure if it's a bit much for this PR. >> >> Or do you want me to (temporarily) duplicate the Value code (i.e. move more code from the templated function to the Value functions, and only keep the simple `reverse_bytes` from your snippet)? >> >> Alternatively, I'd also be fine to put this PR on hold and make InvolutionNode a TypeNode first, or whatever you think is best. Please let me know what you think. > >> And I assume that should be a TypeNode then? > > Why do you think it benefits from becoming a `TypeNode`? > >> In that case, as ReverseBytes*Nodes are InvolutionNodes, would InvolutionNode need to be a TypeNode? Or can I use multiple inheritance? > > We try to avoid multiple inheritance in JVM and C2 doesn't use any AFAIK. > > Actually, I had an afterthought about `InvolutionNode` after approving it. It looks a bit weird to model "involution" property through inheritance. (Primarily, because it's hard to mix multiple properties.) Node flags would be a better fit IMO. For now, I suggest to just add a superclass`ReverseBytesNode` which extends `InvolutionNode` and place `Value` there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2035862453 From qamai at openjdk.org Wed Apr 9 18:11:09 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Apr 2025 18:11:09 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v9] In-Reply-To: References: Message-ID: <7YA6L2iyqBhZPOUZ8Ps_DsawE9VofyXnEqmJfoxJ0Ng=.b3acc12e-c0fc-40c3-a0fe-f88a2cd382af@github.com> > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: assert CastLL ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22880/files - new: https://git.openjdk.org/jdk/pull/22880/files/bc9262ff..45b45495 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=07-08 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880 PR: https://git.openjdk.org/jdk/pull/22880 From qamai at openjdk.org Wed Apr 9 18:11:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Apr 2025 18:11:10 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 21:27:33 GMT, Vladimir Ivanov wrote: >> I believe C2 stacks are 16-byte aligned so we should not need to manually do the alignment? > > Indeed. Still puzzled why stack traces are omitted. Do you have a reproducer to try? You can just run `compiler/arraycopy/TestArrayCopyConjoint.java` with `-XX:+StressGCM -XX:VerifyConstraintCasts=2`. >> Thanks for the suggestions, I have changed this to `verify_int_in_range`. Expanding the `Type` before passing into the function would be really bad after #17508, though. > >> Expanding the Type before passing into the function would be really bad after https://github.com/openjdk/jdk/pull/17508, though. > > Can you elaborate, please? Are you saying that after #17508 there'll be more invariants to verify, so better to pass all the data encapsulated in a `Type`? Yes that's what I mean, expanding all of them to feed into this method seems inelegant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2035872375 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2035873092 From qamai at openjdk.org Wed Apr 9 18:11:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Apr 2025 18:11:10 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v8] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 21:37:40 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> reviews > > Speaking of bug synopsis, can you make it a bit more concrete and succinct? > > How about "C2: Verify CastII/CastLL bounds at runtime"? @iwanowww Thanks a lot for the reviews, I have updated according to your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2790541635 From dhanalla at openjdk.org Wed Apr 9 18:17:58 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 9 Apr 2025 18:17:58 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v9] In-Reply-To: References: Message-ID: <__DGCaYr00WHOOcj95KQL-ciFXBnEM2RA15XFwhObYo=.231ad51f-96f4-4945-a62b-5b2eed90f2b5@github.com> On Wed, 9 Apr 2025 05:39:51 GMT, Christian Hagedorn wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify jtreg test > > test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 28: > >> 26: * @bug 8315916 >> 27: * @summary Test early bailout during the creation of graph nodes for the scalarization of array fields, rather than during code generation. >> 28: * @run main/othervm/timeout=240000 > > That's a huge timeout. How long does this test need on your machine? If too long, can you also trigger the issue with a smaller `EliminateAllocationArraySizeLimit` than currently used? updated the EliminateAllocationArraySizeLimit to 32k by reducing the array size. > test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 29: > >> 27: * @summary Test early bailout during the creation of graph nodes for the scalarization of array fields, rather than during code generation. >> 28: * @run main/othervm/timeout=240000 >> 29: * -Xcomp > > Can you limit the performed compilations to a single or few methods with `compileonly` to trigger the assert? This will reduce the required time to run this test with `Xcomp`. added compileonly options ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2035885049 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2035886103 From dhanalla at openjdk.org Wed Apr 9 18:17:57 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 9 Apr 2025 18:17:57 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v10] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with two additional commits since the last revision: - reduce array/node size limts and remove the timeout - reduce array/node size limts and remove the timeout ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/8cb1e939..a8cb47d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=08-09 Stats: 13 lines in 2 files changed: 3 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From vlivanov at openjdk.org Wed Apr 9 18:35:34 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Apr 2025 18:35:34 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v7] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:04:10 GMT, Quan Anh Mai wrote: >>> Expanding the Type before passing into the function would be really bad after https://github.com/openjdk/jdk/pull/17508, though. >> >> Can you elaborate, please? Are you saying that after #17508 there'll be more invariants to verify, so better to pass all the data encapsulated in a `Type`? > > Yes that's what I mean, expanding all of them to feed into this method seems inelegant. Ok, it makes perfect sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2035912582 From vlivanov at openjdk.org Wed Apr 9 19:28:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Apr 2025 19:28:27 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 02:58:17 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8352675: Support Intel AVX10 converged vector ISA feature detection src/hotspot/cpu/x86/vm_version_x86.cpp line 895: > 893: _features = _cpuid_info.feature_flags(); // These can be changed by VM settings > 894: _extra_features = _cpuid_info.extra_feature_flags(); // These can be changed by VM settings > 895: _cpu_features = _features; // Preserve features `_cpu_features` now captures only part of CPU features JVM cares about. src/hotspot/cpu/x86/vm_version_x86.cpp line 1117: > 1115: cpu_family(), _model, _stepping, os::cpu_microcode_revision()); > 1116: assert(res > 0, "not enough temporary space allocated"); > 1117: insert_features_names(_features, buf + res, sizeof(buf) - res, _features_names); x86 is the only platform which uses `insert_features_names`. Other platforms rely on macros. Maybe it's time to do the same on x86? src/hotspot/share/runtime/abstract_vm_version.hpp line 61: > 59: // Extra CPU feature flags used when all 64 bits of _features are exhausted for > 60: // on a given target, currently only used for x86_64, can be affected by VM settings. > 61: static uint64_t _extra_features; That's unfortunate. Maybe it's time to turn `_features` into a fixed size (platform-specific) bitmap instead? (`RegMask` is one existing example.) Having 2 independent fields is error-prone (look at `_cpu_features`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2036008552 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2035996215 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2036007023 From dlong at openjdk.org Wed Apr 9 20:09:34 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Apr 2025 20:09:34 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 22:51:35 GMT, Dean Long wrote: >> This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. >> >> Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/domgraph.cpp > > Co-authored-by: Roberto Casta?eda Lozano Thanks Roberto and Tobias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24390#issuecomment-2790875974 From dlong at openjdk.org Wed Apr 9 20:09:35 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Apr 2025 20:09:35 GMT Subject: Integrated: 8353041: NeverBranchNode causes incorrect block frequency calculation In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:51:37 GMT, Dean Long wrote: > This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. > > Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. This pull request has now been integrated. Changeset: 776e1cf1 Author: Dean Long URL: https://git.openjdk.org/jdk/commit/776e1cf1dfefd7cb1a0190ab71f71ad5ff25d0e4 Stats: 20 lines in 2 files changed: 18 ins; 1 del; 1 mod 8353041: NeverBranchNode causes incorrect block frequency calculation Reviewed-by: thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/24390 From cushon at openjdk.org Wed Apr 9 20:23:35 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 9 Apr 2025 20:23:35 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v10] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 15:23:06 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn >From Matthias: Thank you all! Will integrate this now. I hope this addresses all remaining issues with the optimization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2790908734 From cushon at openjdk.org Wed Apr 9 20:23:36 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 9 Apr 2025 20:23:36 GMT Subject: Integrated: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:45:52 GMT, Liam Miller-Cushon wrote: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. This pull request has now been integrated. Changeset: 4954a336 Author: Liam Miller-Cushon URL: https://git.openjdk.org/jdk/commit/4954a336f88865a4c9b269ed2c152658275e9221 Stats: 77 lines in 3 files changed: 70 ins; 1 del; 6 mod 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint Co-authored-by: Matthias Ernst Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23871 From jlu at openjdk.org Wed Apr 9 21:28:41 2025 From: jlu at openjdk.org (Justin Lu) Date: Wed, 9 Apr 2025 21:28:41 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: On Wed, 9 Apr 2025 15:06:32 GMT, Magnus Ihse Bursie wrote: >> Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Convert the merged master changes to UTF-8 >> - Merge master and fix conflicts >> - Close streams when finished loading into props >> - Adjust CF test to read in with UTF-8 to fix failing test >> - Reconvert CS.properties to UTF-8 >> - Revert all changes to CurrencySymbols.properties >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties >> - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a > > src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties line 22: > >> 20: # Peter Smolik >> 21: Cp1250 WINDOWS-1250 0x00FF >> 22: # Patch attributed to havardw at underdusken.no (H?vard Wigtil) > > This does not seem to have been a correct conversion. Right, that `?` looks to have been incorrectly converted during the ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion as this change is from some time ago.) Since the change occurs in a comment (thankfully), it should be harmless and the next upstream update of this file would overwrite this incorrect change. However, this file does not seem to be updated that often, so I can also file an issue to correct this if you would prefer that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036165417 From mdoerr at openjdk.org Wed Apr 9 21:31:25 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 21:31:25 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: <3IOfTTpkYSmmQcYRagn4f5uLP4wU8ArZRsGmpHdnV2I=.d9c57e7b-5275-435a-abdb-2a7a28734fec@github.com> On Wed, 9 Apr 2025 08:57:40 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: > > - reviews for Martin > - Revert "minor improvement" > > This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. > - minor improvement > - reviews from Lutz and Martin This looks good to me. I suggest measuring performance with the latest version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2791036141 From sviswanathan at openjdk.org Wed Apr 9 23:31:37 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Apr 2025 23:31:37 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same In-Reply-To: References: Message-ID: <679T11zTC_n620-mbQBrge_YYoGCWhWu8E-8c-ABZxg=.1e347d21-7347-4114-82f7-ca76abacc71d@github.com> On Fri, 4 Apr 2025 01:15:36 GMT, Srinivas Vamsi Parasa wrote: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. src/hotspot/cpu/x86/assembler_x86.cpp line 13770: > 13768: InstructionAttr *attributes, bool no_flags, bool use_prefixq) { > 13769: // Demote RegRegReg instructions > 13770: if (!no_flags && dst_enc == nds_enc) { This could be replaced by call to is_demotable(). src/hotspot/cpu/x86/assembler_x86.cpp line 13771: > 13769: // Demote RegRegReg instructions > 13770: if (!no_flags && dst_enc == nds_enc) { > 13771: return use_prefixq? prefixq_and_encode(dst_enc, src_enc) : prefix_and_encode(dst_enc, src_enc); Nit pick, need space before ? as below: use_prefixq ? prefixq_and_encode src/hotspot/cpu/x86/assembler_x86.cpp line 13780: > 13778: InstructionAttr *attributes, bool no_flags, bool use_prefixq) { > 13779: // Demote RegReg and RegRegImm instructions > 13780: if (!no_flags && dst_enc == nds_enc) { This could be replaced by call to is_demotable(). src/hotspot/cpu/x86/assembler_x86.cpp line 13818: > 13816: } > 13817: > 13818: bool Assembler::is_demotable(bool no_flags, int dst_enc, int nds_enc, int src_enc) { src_enc is not being used in this method so could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 17185: > 17183: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 17184: // Encoding Format : eevex_prefix (4 bytes) | opcode_cc | modrm > 17185: int encode = evex_prefix_and_encode_ndd(0, 0, dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); //TODO: check this This should not be demoted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2036263931 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2036260211 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2036264622 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2036257623 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2036258489 From duke at openjdk.org Wed Apr 9 23:57:51 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 9 Apr 2025 23:57:51 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v3] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios. > > | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | > | :-------------------: | :----------------: | :----------------: | :------------------------: | > | [-1, 1] | 26.043 | 25.929 | -0.44 | > | [-2, 2] | 25.330 | 25.260 | -0.28 | > | [-10, 10] | 24.930 | 24.936 | +0.02 | > | [-20, 20] | 24.908 | 24.844 | -0.26 | > | [-100, 100] | 53.813 | 76.650 | +42.44 | > | [-1000, 1000] | 84.459 | 115.106 | +36.29 | > | [-10000, 10000] | 93.980 | 123.320 | +31.22 | > | [-100000, 1000... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Add new tanh micro-benchmark that covers different ranges of input values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/e563fd73..4a9ad41a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=01-02 Stats: 3422 lines in 2 files changed: 60 ins; 2897 del; 465 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From duke at openjdk.org Thu Apr 10 00:02:50 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 10 Apr 2025 00:02:50 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v4] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios. > > | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | > | :-------------------: | :----------------: | :----------------: | :------------------------: | > | [-1, 1] | 26.043 | 25.929 | -0.44 | > | [-2, 2] | 25.330 | 25.260 | -0.28 | > | [-10, 10] | 24.930 | 24.936 | +0.02 | > | [-20, 20] | 24.908 | 24.844 | -0.26 | > | [-100, 100] | 53.813 | 76.650 | +42.44 | > | [-1000, 1000] | 84.459 | 115.106 | +36.29 | > | [-10000, 10000] | 93.980 | 123.320 | +31.22 | > | [-100000, 1000... Mohamed Issa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Add new tanh micro-benchmark that covers different ranges of input values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/4a9ad41a..4fd92c0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=02-03 Stats: 3434 lines in 2 files changed: 2871 ins; 1 del; 562 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From duke at openjdk.org Thu Apr 10 00:12:07 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 10 Apr 2025 00:12:07 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: References: Message-ID: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios. > > | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | > | :-------------------: | :----------------: | :----------------: | :------------------------: | > | [-1, 1] | 26.043 | 25.929 | -0.44 | > | [-2, 2] | 25.330 | 25.260 | -0.28 | > | [-10, 10] | 24.930 | 24.936 | +0.02 | > | [-20, 20] | 24.908 | 24.844 | -0.26 | > | [-100, 100] | 53.813 | 76.650 | +42.44 | > | [-1000, 1000] | 84.459 | 115.106 | +36.29 | > | [-10000, 10000] | 93.980 | 123.320 | +31.22 | > | [-100000, 1000... Mohamed Issa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Add new tanh micro-benchmark that covers different ranges of input values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/4fd92c0e..ced66426 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=03-04 Stats: 3398 lines in 1 file changed: 0 ins; 3 del; 3395 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From vlivanov at openjdk.org Thu Apr 10 00:40:32 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Apr 2025 00:40:32 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v9] In-Reply-To: <7YA6L2iyqBhZPOUZ8Ps_DsawE9VofyXnEqmJfoxJ0Ng=.b3acc12e-c0fc-40c3-a0fe-f88a2cd382af@github.com> References: <7YA6L2iyqBhZPOUZ8Ps_DsawE9VofyXnEqmJfoxJ0Ng=.b3acc12e-c0fc-40c3-a0fe-f88a2cd382af@github.com> Message-ID: On Wed, 9 Apr 2025 18:11:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > assert CastLL FTR here's AArch64 support: https://github.com/openjdk/jdk/commit/7ed34d09560d9db79183a80df379f3003f79bb1b Feel free to incorporate it in this PR or I'll upstream it separately.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2791267118 From vlivanov at openjdk.org Thu Apr 10 00:57:39 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Apr 2025 00:57:39 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v7] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:03:36 GMT, Quan Anh Mai wrote: >> Indeed. Still puzzled why stack traces are omitted. Do you have a reproducer to try? > > You can just run `compiler/arraycopy/TestArrayCopyConjoint.java` with `-XX:+StressGCM -XX:VerifyConstraintCasts=2`. Mystery solved: it is the consequence of the fact that JVM doesn't preserve frame pointer in generated code. With `-XX:+PreserveFramePointer` everything works fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2036336004 From fyang at openjdk.org Thu Apr 10 01:29:35 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Apr 2025 01:29:35 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: <9B0TcQ921YuBjvy7tQL0kN0Yyi90W0NWu8erV3itIXw=.bbf0953f-ebf9-4ddc-b3c1-a6ba5e00e214@github.com> On Wed, 9 Apr 2025 11:32:16 GMT, Daniel Lund?n wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update comments >> - Revise TestNestedSynchronize to make use of CompileFramework > > This is a rather intrusive changeset that may also affect ports not supported by Oracle. @offamitkumar @TheRealMDoerr @snazarkin @bulasevich @RealFYang: Can you please test these changes on your respective ports? In particular, please make sure to run the tests `compiler/arguments/TestMaxMethodArguments.java` and `compiler/locks/TestNestedSynchronize.java`. @dlunde : Thanks for the ping! The included tests passed on linux-riscv64 platform as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2791318426 From qxing at openjdk.org Thu Apr 10 02:25:28 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 10 Apr 2025 02:25:28 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: <39qHmgtsAG41nnCfeuBDU0s7khnIjAXZyDILN-E1tq4=.6aa846ea-71cf-4347-a88a-53e7480e69e0@github.com> On Wed, 2 Apr 2025 07:19:21 GMT, Emanuel Peter wrote: > Wow, that sounds like we do not safepoint for half a second in that case. That could be a bug. Could you please tell me what test it is, and how you ran it? We may want to file a bug and investigate it. @eme64 I try to run `jdk_foreign` tests with commit 45b7c748737f38c33c4666d17101b051b2fbe2ae on x86-64, and the following `make` command always results in a safepoint timeout in `TestStringEncodingJumbo.java`: make test CONF=release TEST="jtreg:test/jdk:jdk_foreign" JTREG="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=500" > And what about late inlining? Does that not happen after loop opts? Maybe we insert new SafePoints when inlining, I simply don't know enough about that. It seems that late inlining after loop opts only performs strength reduction, i.e. converts those dynamic calls to direct calls, and does not incrementally inline callers into callees. Note that `inlining_incrementally` should always be false when calling `process_late_inline_calls_no_inline`, so no inlining is allowed: https://github.com/openjdk/jdk/blob/45b7c748737f38c33c4666d17101b051b2fbe2ae/src/hotspot/share/opto/compile.cpp#L2215-L2220 > Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? I'll do that later, thanks for your suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2791386007 From jbhateja at openjdk.org Thu Apr 10 02:33:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 02:33:31 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 19:25:12 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8352675: Support Intel AVX10 converged vector ISA feature detection > > src/hotspot/cpu/x86/vm_version_x86.cpp line 895: > >> 893: _features = _cpuid_info.feature_flags(); // These can be changed by VM settings >> 894: _extra_features = _cpuid_info.extra_feature_flags(); // These can be changed by VM settings >> 895: _cpu_features = _features; // Preserve features > > `_cpu_features` now captures only part of CPU features JVM cares about. It has limited usage and was introduced by JDK-8324734, the comment over it mentions + // Feature identification not affected by VM flags https://github.com/openjdk/jdk/pull/17590#issuecomment-1912689152 It's not even captured by JVMCI https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jvmci/vmStructs_jvmci.cpp#L149 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2036394043 From vlivanov at openjdk.org Thu Apr 10 02:56:30 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Apr 2025 02:56:30 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 02:27:45 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 895: >> >>> 893: _features = _cpuid_info.feature_flags(); // These can be changed by VM settings >>> 894: _extra_features = _cpuid_info.extra_feature_flags(); // These can be changed by VM settings >>> 895: _cpu_features = _features; // Preserve features >> >> `_cpu_features` now captures only part of CPU features JVM cares about. > > It has limited usage and was introduced by JDK-8324734, the comment over it mentions > + // Feature identification not affected by VM flags > https://github.com/openjdk/jdk/pull/17590#issuecomment-1912689152 > > Unlike _featues, _cpu_featues is not even captured by JVMCI > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jvmci/vmStructs_jvmci.cpp#L149 I don't see how JVMCI is relevant here. JDK-8324734 was related to AVX512. I don't see why a very similar scenario won't be applicable to AVX10. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2036411432 From jbhateja at openjdk.org Thu Apr 10 03:14:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 03:14:24 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 02:54:05 GMT, Vladimir Ivanov wrote: >> It has limited usage and was introduced by JDK-8324734, the comment over it mentions >> + // Feature identification not affected by VM flags >> https://github.com/openjdk/jdk/pull/17590#issuecomment-1912689152 >> >> Unlike _featues, _cpu_featues is not even captured by JVMCI >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jvmci/vmStructs_jvmci.cpp#L149 > > I don't see how JVMCI is relevant here. > > JDK-8324734 was related to AVX512. I don't see why a very similar scenario won't be applicable to AVX10. Yes, its has a limited usage only in C2 context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2036422554 From vlivanov at openjdk.org Thu Apr 10 03:15:32 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Apr 2025 03:15:32 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v7] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 00:55:18 GMT, Vladimir Ivanov wrote: >> You can just run `compiler/arraycopy/TestArrayCopyConjoint.java` with `-XX:+StressGCM -XX:VerifyConstraintCasts=2`. > > Mystery solved: it is the consequence of the fact that JVM doesn't preserve frame pointer in generated code. With `-XX:+PreserveFramePointer` everything works fine. FTR with the following patch I see stack traces both w/ and w/o `-XX:+PreserveFramePointer`: https://github.com/iwanowww/jdk/commit/bbcd8d4dc7ab24152776a3e937ebd27367cf7331 I'll file a bug to fix the underlying issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2036422809 From amitkumar at openjdk.org Thu Apr 10 06:17:39 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 10 Apr 2025 06:17:39 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: <8u10cuHeKaKRylyrV8WnjIaW9SYhWI1dHRcih1ucgvc=.fd81b31e-aae4-4ec8-a53a-f1c801e715f2@github.com> On Mon, 7 Apr 2025 15:18:46 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework Thanks for the ping, I see that suggested tests pass on s390x as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2791665166 From hgreule at openjdk.org Thu Apr 10 06:55:25 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 10 Apr 2025 06:55:25 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v2] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Wed, 9 Apr 2025 17:56:06 GMT, Vladimir Ivanov wrote: >>> And I assume that should be a TypeNode then? >> >> Why do you think it benefits from becoming a `TypeNode`? >> >>> In that case, as ReverseBytes*Nodes are InvolutionNodes, would InvolutionNode need to be a TypeNode? Or can I use multiple inheritance? >> >> We try to avoid multiple inheritance in JVM and C2 doesn't use any AFAIK. >> >> Actually, I had an afterthought about `InvolutionNode` after approving it. It looks a bit weird to model "involution" property through inheritance. (Primarily, because it's hard to mix multiple properties.) Node flags would be a better fit IMO. > > For now, I suggest to just add a superclass`ReverseBytesNode` which extends `InvolutionNode` and place `Value` there. > Why do you think it benefits from becoming a `TypeNode`? Oh yeah I somehow confused myself there successfully. > Actually, I had an afterthought about `InvolutionNode` after approving it. It looks a bit weird to model "involution" property through inheritance. (Primarily, because it's hard to mix multiple properties.) Node flags would be a better fit IMO. That would work, although it would make the common implementation more difficult I think. Also AddNode similarly models the "addition in a (semi)-ring" property, but there's clearly more shared code there (and the property can't be modeled as a flag there because the respective multiplicative operation is defined there too). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2036632744 From jbhateja at openjdk.org Thu Apr 10 06:56:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 06:56:08 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v3] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/ae48895b..73e1118e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=01-02 Stats: 42 lines in 2 files changed: 35 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From jbhateja at openjdk.org Thu Apr 10 06:56:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 06:56:08 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 09:44:06 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/share/opto/intrinsicnode.cpp line 266: > >> 264: if ( opc == Op_CompressBits) { >> 265: // Pattern: Integer/Long.compress(src_type, mask_type) >> 266: int max_mask_bit_width; > > Suggestion: > > int result_bit_width; > > Is this bit width not about the result? It is really not about the mask. > > Example: > `mask_type->hi_as_long() < -1L` > > Here, the mask has the uppermost bit set, and so the bit width of it is the maximum 32 / 64 bits. > > But we still can deduce that the result has one leading zero bit, and so the bit width of the result is either 31 or 63. Your interpretation is correct. But this is indeed the max_mask_bit_width, which constrains the upper bound of the result value. > src/hotspot/share/opto/intrinsicnode.cpp line 292: > >> 290: // compression result will never be a -ve value and we can safely set the >> 291: // lower bound of the result value range to zero. >> 292: lo = max_mask_bit_width == mask_bit_width ? lo : 0L; > > Can you please add an assert that we are not making `lo` worse than what we already have? Someone may insert optimizations above that set `lo > 0`, and then you may lower it again here. > Suggestion: > > assert(lo < 0, "we should not lower the value of lo"); > lo = max_mask_bit_width == mask_bit_width ? lo : 0L; We are already initializing 'lo' to min_jint/jlong value upfront and nothing before this line is modifying its value. Since initialization and use both are part of this function hense adding an assertion looks redundant. We generally add assertions to set constraints under which a logic is implemented. > src/hotspot/share/opto/intrinsicnode.cpp line 298: > >> 296: // in case input equals above estimated lower bound. >> 297: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> 298: hi = max_mask_bit_width < mask_bit_width ? (1L << max_mask_bit_width) - 1 : hi; > > I still don't understand your comment here. For example, I don't see a `max_int` in the code... And I also don't see anything that deals with constants in the code explicitly. > > And similarly as above, how do we ensure that `hi` is not raised accidentally? 'hi' is initialized to 'max_int' and nothing before line 297 is modifying 'hi'. 'lo' is either 'min_int' or '0', for constant input set to 'min_int' we want prevent incorrect constant folding of result as for any integral constant lo and hi bounds points to same value. For zero input we have added special value transform for compress/expand which is related to the fix. > src/hotspot/share/opto/intrinsicnode.cpp line 391: > >> 389: return TypeInteger::zero(bt); >> 390: } >> 391: > > Is this change related to the PR title? And do you have any tests for it? Zero value transform is related to fix added new test points. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2036631948 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2036631830 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2036631742 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2036632791 From jbhateja at openjdk.org Thu Apr 10 06:59:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 06:59:28 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 09:50:42 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > @jatin-bhateja Thanks for the updates! I have a few more requests :) Hi @eme64 , I have addressed and responded to your comments, please verify. > test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > > Ah, I just noticed the test directory. I think we can put it in a more specific location. There are operation specific tests under compiler/c2 let keep it this way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2791743206 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2036638391 From jbhateja at openjdk.org Thu Apr 10 07:06:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 07:06:37 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: <2pz-rUzSGEmewJniiogeaEdy5NXgUQJr0GUVz-sTK80=.5edf4309-3ca5-497b-842d-374c3f06358f@github.com> References: <2pz-rUzSGEmewJniiogeaEdy5NXgUQJr0GUVz-sTK80=.5edf4309-3ca5-497b-842d-374c3f06358f@github.com> Message-ID: On Wed, 9 Apr 2025 08:54:06 GMT, Jatin Bhateja wrote: >> Changes requested by epeter (Reviewer). > > Hi @eme64 , Please let us know if there are further comments. > @jatin-bhateja I think it looks good now, but let me run some more tests :) Hi @eme64 , please let us know if this is good to land :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22755#issuecomment-2791758591 From roland at openjdk.org Thu Apr 10 07:07:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 10 Apr 2025 07:07:43 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 22:00:31 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Marked as reviewed by vlivanov (Reviewer). @iwanowww thanks for the review @chhagedorn thanks for the review and testing @eme64 thanks for the comments ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2791757765 From roland at openjdk.org Thu Apr 10 07:07:44 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 10 Apr 2025 07:07:44 GMT Subject: Integrated: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 16:42:02 GMT, Roland Westrelin wrote: > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. This pull request has now been integrated. Changeset: bcac42aa Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/bcac42aabce5b57525f776037d73b51d0afcbaf5 Stats: 249 lines in 20 files changed: 235 ins; 3 del; 11 mod 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable Reviewed-by: chagedorn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/23468 From duke at openjdk.org Thu Apr 10 07:11:32 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 07:11:32 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 14:10:42 GMT, Manuel H?ssig wrote: >> This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. >> >> I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix capitalization Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24509#issuecomment-2791767407 From duke at openjdk.org Thu Apr 10 07:11:32 2025 From: duke at openjdk.org (duke) Date: Thu, 10 Apr 2025 07:11:32 GMT Subject: RFR: 8353842: C2: Add graph dumps before and after loop opts phase [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 14:10:42 GMT, Manuel H?ssig wrote: >> This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. >> >> I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix capitalization @mhaessig Your change (at version 951c516e60f248627770df058df07aa6a0a08b48) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24509#issuecomment-2791769314 From epeter at openjdk.org Thu Apr 10 07:17:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Apr 2025 07:17:28 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v12] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:59:08 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Minor tuning in selection pattern No related test failures :) :green_circle: Approved, thanks for the work @jatin-bhateja ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22755#pullrequestreview-2755592190 From chagedorn at openjdk.org Thu Apr 10 07:24:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 07:24:29 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() In-Reply-To: References: Message-ID: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> On Fri, 4 Apr 2025 20:33:47 GMT, Kangcheng Xu wrote: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple to detection and conversion code. This enables us to try different loop configurations easy and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). That's a good idea to refactor this, thanks for tackling it! Some initial general comments, I will have a closer look at your patch later again. src/hotspot/share/opto/loopnode.cpp line 313: > 311: // clone the inner loop etc. No optimizations need to change the outer > 312: // strip mined loop as it is only a skeleton. > 313: IdealLoopTree* PhaseIdealLoop::create_outer_strip_mined_loop(Node *init_control, Generally, for the touched code, you can also fix the `*` placement to be at the type. Suggestion: IdealLoopTree* PhaseIdealLoop::create_outer_strip_mined_loop(Node* init_control, src/hotspot/share/opto/loopnode.cpp line 367: > 365: } > 366: > 367: Node* PhaseIdealLoop::loop_exit_control(const Node* x, const IdealLoopTree* loop) const { There are quite some places where we use `x` to denote a `loop_head`. Maybe you can also fix that with this patch. src/hotspot/share/opto/loopnode.cpp line 1465: > 1463: assert(back_control != nullptr, "no back control"); > 1464: > 1465: LoopExitTest exit_test = loop_exit_test(back_control, loop); Just an idea: Would it make sense to have a single `LoopStructure` (or another name) class that contains all the information stored now with separate structs `LoopExitTest`, `LoopIvIncr` and `LoopIvStride`? Then we could offer query methods for validation like `is_valid()`, `is_stride_valid()` etc., or check for existence like `has_stride()`, or access specific nodes like `stride()`, `incr()` etc. src/hotspot/share/opto/loopnode.hpp line 1310: > 1308: Node*& entry_control, Node*& iffalse); > 1309: > 1310: class CountedLoopConverter { Probably quite a subjective matter, but what about just naming it `CountedLoop`? Then you have `counted_loop.convert()`. src/hotspot/share/opto/loopnode.hpp line 1311: > 1309: > 1310: class CountedLoopConverter { > 1311: PhaseIdealLoop* _phase; I suggest to add `const` whenever you can. For example, here you will probably not change the `_phase` pointer anymore: Suggestion: PhaseIdealLoop* const _phase; src/hotspot/share/opto/loopnode.hpp line 1331: > 1329: bool _includes_limit; > 1330: BoolTest::mask _mask; > 1331: Node* _increment; Here you have `(_phi_)incr` and `increment`. I suggest to make the naming consistent. I personally prefer full names whenever we can if it's not a universal abbreviation like `cmp` or `iv`. But what we count as well-known abbreviation is debatable and also depends on personal taste. src/hotspot/share/opto/loopnode.hpp line 1336: > 1334: Node* _sfpt; > 1335: jlong _final_correction; > 1336: Node* _trunc1; `trunc1` is a little hard to understand. Can we find a better name? ------------- PR Review: https://git.openjdk.org/jdk/pull/24458#pullrequestreview-2755561053 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036648877 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036651203 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036660181 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036665370 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036666689 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036668900 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2036673872 From duke at openjdk.org Thu Apr 10 07:27:37 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 07:27:37 GMT Subject: Integrated: 8353842: C2: Add graph dumps before and after loop opts phase In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:28:59 GMT, Manuel H?ssig wrote: > This PR adds graph dumps before and after loop optimizations, but only if the compiled method actually contains loops. This helps to distinguish loop optimizations in IGV and to match loop related nodes like opaque template assertion predicates. > > I tested this by compiling a few test methods and looking at the ideal graph. Also, I ran tier1 through tier3 and Oracle internal testing. This pull request has now been integrated. Changeset: 4f80437e Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4f80437ee05e4a3f755a166140669c0fd631f56d Stats: 13 lines in 3 files changed: 11 ins; 0 del; 2 mod 8353842: C2: Add graph dumps before and after loop opts phase Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24509 From ihse at openjdk.org Thu Apr 10 07:34:37 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 07:34:37 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: On Wed, 9 Apr 2025 21:26:15 GMT, Justin Lu wrote: >> src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties line 22: >> >>> 20: # Peter Smolik >>> 21: Cp1250 WINDOWS-1250 0x00FF >>> 22: # Patch attributed to havardw at underdusken.no (H?vard Wigtil) >> >> This does not seem to have been a correct conversion. > > Right, that `?` looks to have been incorrectly converted during the ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion as this change is from some time ago.) > > Since the change occurs in a comment (thankfully), it should be harmless and the next upstream update of this file would overwrite this incorrect change. However, this file does not seem to be updated that often, so I can also file an issue to correct this if you would prefer that. You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right now, where I will include a fix for this as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036695622 From ihse at openjdk.org Thu Apr 10 07:34:37 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 07:34:37 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: On Thu, 10 Apr 2025 07:31:37 GMT, Magnus Ihse Bursie wrote: >> Right, that `?` looks to have been incorrectly converted during the ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion as this change is from some time ago.) >> >> Since the change occurs in a comment (thankfully), it should be harmless and the next upstream update of this file would overwrite this incorrect change. However, this file does not seem to be updated that often, so I can also file an issue to correct this if you would prefer that. > > You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right now, where I will include a fix for this as well. If anything, I might be a bit worried that there are more incorrect conversions stemming from this PR, that my automated tools and manual scanning has not revealed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036696723 From duke at openjdk.org Thu Apr 10 07:50:34 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 10 Apr 2025 07:50:34 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v5] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:54:57 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > correcting comments in included test Thank you for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24410#issuecomment-2791861762 From duke at openjdk.org Thu Apr 10 07:54:34 2025 From: duke at openjdk.org (duke) Date: Thu, 10 Apr 2025 07:54:34 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value [v5] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:54:57 GMT, Saranya Natarajan wrote: >> Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. >> >> Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. >> >> A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > correcting comments in included test @sarannat Your change (at version 21623249d7e09116c54d9f6b81f3a819cab22a18) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24410#issuecomment-2791868892 From duke at openjdk.org Thu Apr 10 08:01:38 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 10 Apr 2025 08:01:38 GMT Subject: Integrated: 8351660: C2: SIGFPE in unsigned_mod_value In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 10:29:08 GMT, Saranya Natarajan wrote: > Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value` the expression,` return TypeClass::make(static_cast(dividend % divisor))`, is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. > > Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation as in this test, there was no check for modulo by zero that could trigger an exception during runtime. The below fix proposes a check for modulo by zero and throws exception at runtime. > > A Jtreg test has been added as part of this fix. This test case is based on the original test that resulted in the bug. @eme64 is the contributor of the original test. Thank you @eme64. This pull request has now been integrated. Changeset: 04e2a062 Author: Saranya Natarajan Committer: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/04e2a0621d80f23cf70b4649ec4c24dad28e8e2d Stats: 63 lines in 2 files changed: 63 ins; 0 del; 0 mod 8351660: C2: SIGFPE in unsigned_mod_value Co-authored-by: Emanuel Peter Reviewed-by: chagedorn, dfenacci, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24410 From eirbjo at openjdk.org Thu Apr 10 08:10:42 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Thu, 10 Apr 2025 08:10:42 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: <6c6DqyCqyPonBZgUU8BpYJR3JQvMXjWm9ulq4SN25Do=.77775825-716d-4908-ae24-c4cf1ead78a5@github.com> On Thu, 10 Apr 2025 07:32:18 GMT, Magnus Ihse Bursie wrote: >> You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right now, where I will include a fix for this as well. > > If anything, I might be a bit worried that there are more incorrect conversions stemming from this PR, that my automated tools and manual scanning has not revealed. Some observations: 1: This PR seems to have been abondoned, so perhaps this discussion belongs in #15694 ? 2: The `?` (Unicode 'Latin small letter a with ring above' U+00E5) was correctly encoded as 0xEF in ISO-8859-1 previous to this change. 3: The conversion changed this `0xEF` to the three-byte sequence `ef bf bd` 4: This is as-if the file was incorrctly decoded using UTF-8, then encoded using UTF-8: byte[] origBytes = "?".getBytes(StandardCharsets.ISO_8859_1); String decoded = new String(origBytes, StandardCharsets.UTF_8); byte[] encoded = decoded.getBytes(StandardCharsets.UTF_8); String hex = HexFormat.of().formatHex(encoded); assertEquals("efbfbd", hex); ``` Like @magicus I'm worried that similar incorrect decoding could have been introduced by the same script in other files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036767319 From duke at openjdk.org Thu Apr 10 08:35:46 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 08:35:46 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support Message-ID: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14375898384) as well as tier1 through tier3 and Oracle internal testing. ------------- Commit messages: - Fix float16 double negation test Changes: https://git.openjdk.org/jdk/pull/24565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24565&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353730 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24565/head:pull/24565 PR: https://git.openjdk.org/jdk/pull/24565 From ihse at openjdk.org Thu Apr 10 08:38:38 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 08:38:38 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: <6c6DqyCqyPonBZgUU8BpYJR3JQvMXjWm9ulq4SN25Do=.77775825-716d-4908-ae24-c4cf1ead78a5@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> <6c6DqyCqyPonBZgUU8BpYJR3JQvMXjWm9ulq4SN25Do=.77775825-716d-4908-ae24-c4cf1ead78a5@github.com> Message-ID: On Thu, 10 Apr 2025 08:08:02 GMT, Eirik Bj?rsn?s wrote: >> If anything, I might be a bit worried that there are more incorrect conversions stemming from this PR, that my automated tools and manual scanning has not revealed. > > Some observations: > > 1: This PR seems to have been abondoned, so perhaps this discussion belongs in #15694 ? > > 2: The `?` (Unicode 'Latin small letter a with ring above' U+00E5) was correctly encoded as 0xEF in ISO-8859-1 previous to this change. > > 3: The conversion changed this `0xEF` to the three-byte sequence `ef bf bd` > > 4: This is as-if the file was incorrctly decoded using UTF-8, then encoded using UTF-8: > > > byte[] origBytes = "?".getBytes(StandardCharsets.ISO_8859_1); > String decoded = new String(origBytes, StandardCharsets.UTF_8); > byte[] encoded = decoded.getBytes(StandardCharsets.UTF_8); > String hex = HexFormat.of().formatHex(encoded); > assertEquals("efbfbd", hex); > ``` > > Like @magicus I'm worried that similar incorrect decoding could have been introduced by the same script in other files. > This PR seems to have been abondoned, so perhaps this discussion belongs in https://github.com/openjdk/jdk/pull/15694 ? Oh, I didn't notice this was supplanted by another PR. It might be better to continue there, yes. Even if closed PRs seldom are the best places to conduct discussions, I think it might be a good idea to scrutinize all files modified by this script. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036820765 From duke at openjdk.org Thu Apr 10 08:55:28 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 10 Apr 2025 08:55:28 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v5] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv match rule and add test Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8329887 - add vand_not_masked test - add vand_not_L test - Merge branch 'openjdk:master' into JDK-8329887 - RISC-V: C2: Support Zvbb Vector And-Not instruction fix match rule for format - RISC-V: C2: Support Zvbb Vector And-Not instruction add Vector And-Not match rule and tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/bc233f6c..ec0413af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=03-04 Stats: 20728 lines in 381 files changed: 7008 ins; 12227 del; 1493 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From hgreule at openjdk.org Thu Apr 10 09:04:39 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 10 Apr 2025 09:04:39 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v3] In-Reply-To: References: Message-ID: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: introduce common ReverseBytesNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24382/files - new: https://git.openjdk.org/jdk/pull/24382/files/dc046768..90f2e9de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=01-02 Stats: 53 lines in 2 files changed: 17 ins; 22 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From hgreule at openjdk.org Thu Apr 10 09:04:40 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 10 Apr 2025 09:04:40 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v3] In-Reply-To: References: <0Vve2dLoTSqwMKxx7LqaStM6f6_c6NadH-xc3OCSMMI=.c8dafa83-a661-4502-b170-fe7bdee19f2c@github.com> Message-ID: On Wed, 9 Apr 2025 17:56:06 GMT, Vladimir Ivanov wrote: >>> And I assume that should be a TypeNode then? >> >> Why do you think it benefits from becoming a `TypeNode`? >> >>> In that case, as ReverseBytes*Nodes are InvolutionNodes, would InvolutionNode need to be a TypeNode? Or can I use multiple inheritance? >> >> We try to avoid multiple inheritance in JVM and C2 doesn't use any AFAIK. >> >> Actually, I had an afterthought about `InvolutionNode` after approving it. It looks a bit weird to model "involution" property through inheritance. (Primarily, because it's hard to mix multiple properties.) Node flags would be a better fit IMO. > > For now, I suggest to just add a superclass`ReverseBytesNode` which extends `InvolutionNode` and place `Value` there. @iwanowww I applied your suggestion now. Please let me know if this is good now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2036870081 From duke at openjdk.org Thu Apr 10 09:29:19 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 09:29:19 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v2] In-Reply-To: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: > Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). > > This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). > > I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into jdk-8353730-fp16 - Fix float16 double negation test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24565/files - new: https://git.openjdk.org/jdk/pull/24565/files/38de7c7a..5eaedf2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24565&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24565&range=00-01 Stats: 19179 lines in 338 files changed: 5693 ins; 12153 del; 1333 mod Patch: https://git.openjdk.org/jdk/pull/24565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24565/head:pull/24565 PR: https://git.openjdk.org/jdk/pull/24565 From dlunden at openjdk.org Thu Apr 10 09:44:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 09:44:14 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Refactor and improve TestNestedSynchronize.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/b20d14e5..b7aa0351 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=15-16 Stats: 31 lines in 1 file changed: 1 ins; 12 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From jbhateja at openjdk.org Thu Apr 10 09:49:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 09:49:47 GMT Subject: Integrated: 8346236: Auto vectorization support for various Float16 operations In-Reply-To: References: Message-ID: <_jvpmWCyL4N-2Ws8YMjq-NbFQMhwQF-lD5dz4gYRixs=.6af944c2-7717-4108-8247-a3826c97bc11@github.com> On Sun, 15 Dec 2024 20:06:40 GMT, Jatin Bhateja wrote: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... This pull request has now been integrated. Changeset: 9a3f9997 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90 Stats: 1165 lines in 23 files changed: 1078 ins; 12 del; 75 mod 8346236: Auto vectorization support for various Float16 operations Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/22755 From jbhateja at openjdk.org Thu Apr 10 09:49:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Apr 2025 09:49:46 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 23:45:52 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. > > There is a test failure in GHA. A merge with master would be good. Thanks @sviswa7 and @eme64 for review and approvals :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22755#issuecomment-2792180998 From dlunden at openjdk.org Thu Apr 10 09:50:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 09:50:49 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 13:43:48 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update comments >> - Revise TestNestedSynchronize to make use of CompileFramework > > test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 1: > >> 1: /* > > Great idea to generate these tests! Just a nit: the previous version contained 100 instances (from `test1` to `test100`), but this version generates up to `test99` only. Thanks, fixed in my latest commit! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2036964816 From dlunden at openjdk.org Thu Apr 10 09:50:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 09:50:49 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: <_GLaoVcaiLFvpL4yziOtrgd5QscbH-zrEJ2w3c5iMOY=.994428df-9217-4497-b9e1-5a8b8a24ec8f@github.com> References: <_GLaoVcaiLFvpL4yziOtrgd5QscbH-zrEJ2w3c5iMOY=.994428df-9217-4497-b9e1-5a8b8a24ec8f@github.com> Message-ID: On Wed, 9 Apr 2025 14:37:38 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update comments >> - Revise TestNestedSynchronize to make use of CompileFramework > > test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 106: > >> 104: %s >> 105: }""", test_class_name, inner); >> 106: } > > Probably this is not so bad with only 100 repetitions ... but this leads to a lot of successive strings generated, resulting in larger and larger strings allocated, essentially requiring quadratic memory. A StringBuilder would help here. Maybe this is just an FYI: the TemplateFramework internally uses a StringBuilder to avoid this successive concatenation / allocation issue. Thanks for the comment; I implemented it in this way for readability. I refactored it now to avoid the quadratic behavior (assuming `addAll`, `addFirst`, and `addLast` are O(1) for `LinkedList`, and that `String.join` is linear). Please have a look! `StringBuilder` is efficient for appending, but not prepending, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2036961838 From amitkumar at openjdk.org Thu Apr 10 10:17:44 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 10 Apr 2025 10:17:44 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:57:40 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: > > - reviews for Martin > - Revert "minor improvement" > > This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. > - minor improvement > - reviews from Lutz and Martin Result looks almost similar: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 2.349 ? 0.012 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 2.647 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 2.614 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 2.779 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 2.759 ? 0.016 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 2.887 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 2.697 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 2.771 ? 0.034 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 3.700 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 3.165 ? 0.042 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 17.266 ? 0.830 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 4.479 ? 0.019 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 54.563 ? 1.222 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 9.141 ? 0.069 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 2.338 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 2.647 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 2.618 ? 0.009 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 2.780 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 2.752 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 2.889 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 2.695 ? 0.002 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 2.763 ? 0.009 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 3.684 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 3.115 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 16.376 ? 0.018 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 15.394 ? 0.080 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 55.838 ? 1.325 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 52.927 ? 0.874 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 2.281 ? 0.206 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 2.076 ? 0.147 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 2.562 ? 0.004 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 2.020 ? 0.105 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 2.938 ? 0.052 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 2.412 ? 0.007 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 3.349 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 2.304 ? 0.220 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 5.005 ? 0.005 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 2.113 ? 0.110 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 14.160 ? 0.401 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 3.200 ? 0.170 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 55.619 ? 0.672 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 7.613 ? 0.186 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 2.324 ? 0.224 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 2.483 ? 0.004 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 2.565 ? 0.005 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 2.669 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 2.916 ? 0.031 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 3.042 ? 0.029 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 3.360 ? 0.037 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 3.401 ? 0.074 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 5.012 ? 0.014 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 4.592 ? 0.156 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 13.981 ? 0.392 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 14.876 ? 0.894 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 55.273 ? 0.546 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 53.228 ? 1.325 ns/op Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe' ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2792255624 From chagedorn at openjdk.org Thu Apr 10 10:41:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 10:41:36 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v2] In-Reply-To: References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: <494IyTw2GUCRiOC3cD9sQdAxOuk8sUoe2wfTq5MdG88=.cff426f7-f6af-4de7-84cb-47616142dff8@github.com> On Thu, 10 Apr 2025 09:29:19 GMT, Manuel H?ssig wrote: >> Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). >> >> This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). >> >> I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into jdk-8353730-fp16 > - Fix float16 double negation test Looks good to me. test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 58: > 56: @IR(counts = { IRNode.SUB, "2" }, applyIfPlatform = {"x64", "true"}, applyIfCPUFeature = {"avx512_fp16", "false"}) > 57: @IR(counts = { IRNode.SUB_HF, "2" }, applyIfPlatform = {"x64", "true"}, applyIfCPUFeature = {"avx512_fp16", "true"}) > 58: // TODO: uncomment once Float16 support lands in aarch64 Maybe you can also add the actual issue number: Suggestion: // TODO: uncomment once Float16 support lands in aarch64 with JDK-8345125 ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24565#pullrequestreview-2756207593 PR Review Comment: https://git.openjdk.org/jdk/pull/24565#discussion_r2037050255 From duke at openjdk.org Thu Apr 10 10:55:51 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 10:55:51 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v3] In-Reply-To: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: > Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). > > This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). > > I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Add issue number to comment Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24565/files - new: https://git.openjdk.org/jdk/pull/24565/files/5eaedf2b..176cb067 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24565&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24565&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24565/head:pull/24565 PR: https://git.openjdk.org/jdk/pull/24565 From duke at openjdk.org Thu Apr 10 11:19:11 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 11:19:11 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected [v2] In-Reply-To: References: Message-ID: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - ir-framework: use new before/after loop opts phases - Merge branch 'master' into JDK-8346552-predicate-cloning - Add IR test for predicate cloning - ir-framework: make the parse predicate node regex more robust - ir-framework: add auto vectorization check node - ir-framework: add opaque template assertion predicate node ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24479/files - new: https://git.openjdk.org/jdk/pull/24479/files/50511998..fc3d5d11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=00-01 Stats: 34074 lines in 669 files changed: 16942 ins; 14756 del; 2376 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From duke at openjdk.org Thu Apr 10 11:19:11 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 11:19:11 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 07:49:08 GMT, Manuel H?ssig wrote: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing After some discussion, we decided that we should add before and after loop phases to help with the IR phase matching. Filed [JDK-8353842](https://bugs.openjdk.org/browse/JDK-8353842) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24479#issuecomment-2782972982 From dlunden at openjdk.org Thu Apr 10 11:37:17 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 11:37:17 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v2] In-Reply-To: References: Message-ID: > After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. > > ### Changeset > > Changes: > - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. > - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. > - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing > - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. > - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. > > ### Additional issue investigation > > For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer ... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Clean up code related to eager transformation and rename itergvn to igvn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24325/files - new: https://git.openjdk.org/jdk/pull/24325/files/eabf1c0c..76240722 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24325&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24325&range=00-01 Stats: 11 lines in 3 files changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24325/head:pull/24325 PR: https://git.openjdk.org/jdk/pull/24325 From dlunden at openjdk.org Thu Apr 10 11:37:18 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 11:37:18 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v2] In-Reply-To: <33eWiJcoMJMNOS5Cn6QPgg1gUzUF4UqA_q7QPfOwSt4=.4d7cb54b-752d-4a41-baa2-527850b9c18b@github.com> References: <33eWiJcoMJMNOS5Cn6QPgg1gUzUF4UqA_q7QPfOwSt4=.4d7cb54b-752d-4a41-baa2-527850b9c18b@github.com> Message-ID: On Mon, 7 Apr 2025 12:38:59 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2580: >> >>> 2578: // IGVN iteration. We have put the Phi nodes on the IGVN worklist, so >>> 2579: // they are transformed later on in any case. >>> 2580: hook->destruct(igvn); >> >> Do you need this `hook` node to preserve `new_base` now that you are not calling `PhaseGVN::transform` anymore? > > We then probably also do not need the code directly above anymore added by [JDK-8275326](https://bugs.openjdk.org/browse/JDK-8275326) which was only required due to this eager phi transformation. Could you find out why this eager transformation was added in the first place? Thanks for the comments! I have now cleaned up according to your suggestions and checked that it still works (tests and code inspection). I also investigated why the eager transformation was added in the first place, but was unable to find much information. It is from initial load, and looking back even further in the pre-initial-load history, I see that the code was introduced back in 2000 already. The commit message is not really helpful, unfortunately. My best guess is that eagerly transforming the Phis may perhaps result in a preferable ordering of idealizations that introduces less (temporary) nodes. The extremas for "target" and "target-old" gives an indication of this (see the plots in the PR description), but in practice it has no significant observable effect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037141048 From dlunden at openjdk.org Thu Apr 10 11:43:40 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 11:43:40 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:36:57 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up code related to eager transformation and rename itergvn to igvn > > test/hotspot/jtreg/TEST.groups line 187: > >> 185: compiler/interpreter/ \ >> 186: compiler/jvmci/ \ >> 187: compiler/itergvn/ \ > > I suggest to use the short name `igvn` > > On a separate note, I think we should go through all our test folders in `jtreg/compiler` and check if we should add more folders to tier1. For example, `splitif` or `predicates` tests are currently not executed in tier1 but they probably should. Thanks, fixed! Do you want to create an RFE for adding the additional folders to tier1 or should I create one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037152511 From dlunden at openjdk.org Thu Apr 10 11:43:39 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 11:43:39 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v2] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 11:44:49 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up code related to eager transformation and rename itergvn to igvn > > src/hotspot/share/opto/phaseX.cpp line 1058: > >> 1056: // Ensure we did not increase the live node count with more than >> 1057: // max_live_nodes_increase_per_iteration during the call to transform_old >> 1058: DEBUG_ONLY(int increase = live_nodes_after - live_nodes_before;) > > For consistency with the surrounding code, maybe you could define these as `NOT_PRODUCT`, and possibly group them under `#ifndef PRODUCT` blocks? As we discussed offline, there is a subtle difference between `NOT_PRODUCT` and `DEBUG_ONLY`. `NOT_PRODUCT` also runs in non-product "optimized" builds where asserts are disabled, and I really only want this code to run when asserts are enabled. Therefore, I believe `DEBUG_ONLY` (or alternatively `#ifdef ASSERT`) is most suitable here. > test/hotspot/jtreg/compiler/itergvn/TestSplitPhiThroughMergeMem.java line 67: > >> 65: new String("abcdef" + param2); >> 66: new String("ghijklmn" + param1); >> 67: new String("ghijklmn" + param1); > > This test illustrates an interesting behavior: C2 generates around 12 Kb of code for this rather infrequent code path (and the frequency can be further reduced without affecting C2's outcome). This seems to contradict C2's general philosophy of focusing the compilation effort (and code cache usage) on hot code. It would be interesting to investigate whether there is an opportunity to make some heuristic more execution-frequency aware here. Yes, for sure interesting. Let us create a separate RFE to investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037147517 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037150198 From roland at openjdk.org Thu Apr 10 11:54:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 10 Apr 2025 11:54:20 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 Message-ID: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> An `Initialize` node for an `Allocate` node is created with a memory `Proj` of adr type raw memory. In order for stores to be captured, the memory state out of the allocation is a `MergeMem` with slices for the various object fields/array element set to the raw memory `Proj` of the `Initialize` node. If `Phi`s need to be created during later transformations from this memory state, The `Phi` for a particular slice gets its adr type from the type of the `Proj` which is raw memory. If during macro expansion, the `Allocate` is found to have no use and so can be removed, the `Proj` out of the `Initialize` is replaced by the memory state on input to the `Allocate`. A `Phi` for some slice for a field of an object will end up with the raw memory state on input to the `Allocate` node. As a result, memory state at the `Phi` is incorrect and incorrect execution can happen. The fix I propose is, rather than have a single `Proj` for the memory state out of the `Initialize` with adr type raw memory, to use one `Proj` per slice added to the memory state after the `Initalize`. Each of the `Proj` should return the right adr type for its slice. For that I propose having a new type of `Proj`: `NarrowMemProj` that captures the right adr type. Logic for the construction of the `Allocate`/`Initialize` subgraph is tweaked so the right adr type captured in is own `NarrowMemProj` is added to the memory sugraph. Code that removes an allocation or moves it also has to be changed so it correctly takes the multiple memory projections out of the `Initialize` node into account. One tricky issue is that when EA split types for a scalar replaceable `Allocate` node: 1- the adr type captured in the `NarrowMemProj` becomes out of sync with the type of the slices for the allocation 2- before EA, the memory state for one particular field out of the `Initialize` node can be used for a `Store` to the just allocated object or some other. So we can have a chain of `Store`s, some to the newly allocated object, some to some other objects, all of them using the state of `NarrowMemProj` out of the `Initialize`. After split unique types, the `NarrowMemProj` is for the slice of a particular allocation. So `Store`s to some other objects shouldn't use that memory state but the memory state before the `Allocate`. For that, I added logic to update the adr type of `NarrowMemProj` during split unique types and update the memory input of `Store`s that don't depend on the memory state out of the allocation site. I also wrote a verification pass to check that, in the memory graph, nodes on a particular slice have the right adr type. That verification pass is not included here because it uncovered issues that are unrelated to this particular issue so I intend to propose it (with fixes for those other issues) separately. I reused the test cases that Emanuel included in https://github.com/openjdk/jdk/pull/18265 and went over all issues linked to this one, tried the test cases and added the ones that I could reproduce. ------------- Commit messages: - whitespace - fix & extra tests - more tests, WIP - initializing store capturing: test stub - test4 - 8327012 Changes: https://git.openjdk.org/jdk/pull/24570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327963 Stats: 574 lines in 11 files changed: 539 ins; 15 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From dlunden at openjdk.org Thu Apr 10 11:59:26 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 11:59:26 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:03:29 GMT, Roberto Casta?eda Lozano wrote: > it would be good, for completeness, to study how many (if any) of those bailouts in Renaissance, SPECjvm, and SPECjbb are due to excessive IGVN node counts, at least for the "baseline" and "target" configurations. I investigated and it turns out there is one IGVN node count bailout in "target" and zero in "baseline". The bailed out compilation maxes out at 81000 nodes (just above the bailout limit) and looks related to the original failure that resulted in this issue: - Original compilation failure: `com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator::reset` - Compilation bailout (in SPECjvm 2008): `com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl::reset` Given that it is such a rare occurrence and not significant in practice, I suggest we investigate if we can avoid the bailout in a separate RFE. I and @robcasloz had a discussion offline, and it looks like the general problem is that we sometimes have far too many alias categories in the memory graph. We should perhaps limit the number of alias categories in some way, or simply retry compilation with aliasing disabled if things get out of hand. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2792498724 From dlunden at openjdk.org Thu Apr 10 12:10:01 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 12:10:01 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: Message-ID: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> > After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. > > ### Changeset > > Changes: > - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. > - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. > - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing > - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. > - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. > > ### Additional issue investigation > > For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer ... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24325/files - new: https://git.openjdk.org/jdk/pull/24325/files/76240722..6d590d14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24325&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24325&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24325/head:pull/24325 PR: https://git.openjdk.org/jdk/pull/24325 From chagedorn at openjdk.org Thu Apr 10 12:10:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 12:10:01 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v2] In-Reply-To: References: Message-ID: <0ZCeg1eGFhaFDQ9J50vmYKChV-Hd4ddAu3YCkXpHW70=.a1a40f38-b8c7-4c95-acaa-6e2d2a12730a@github.com> On Thu, 10 Apr 2025 11:37:17 GMT, Daniel Lund?n wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Clean up code related to eager transformation and rename itergvn to igvn Two more comments, otherwise, it looks good to me! src/hotspot/share/opto/phaseX.cpp line 1037: > 1035: // Pull from worklist and transform the node. If the node has changed, > 1036: // update edge info and put uses on worklist. > 1037: while(_worklist.size()) { I guess we could fix that here as well to be explicit: Suggestion: while (_worklist.size() > 0) { test/hotspot/jtreg/compiler/igvn/TestSplitPhiThroughMergeMem.java line 42: > 40: */ > 41: > 42: package compiler.itergvn; You should also update the package name accordingly Suggestion: * @run main/othervm -Xbatch * -XX:CompileCommand=CompileOnly,compiler.igvn.TestSplitPhiThroughMergeMem::test * compiler.igvn.TestSplitPhiThroughMergeMem * @run main compiler.igvn.TestSplitPhiThroughMergeMem */ package compiler.igvn; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24325#pullrequestreview-2756410922 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037177508 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037182994 From chagedorn at openjdk.org Thu Apr 10 12:10:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 12:10:01 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: <33eWiJcoMJMNOS5Cn6QPgg1gUzUF4UqA_q7QPfOwSt4=.4d7cb54b-752d-4a41-baa2-527850b9c18b@github.com> Message-ID: On Thu, 10 Apr 2025 11:33:34 GMT, Daniel Lund?n wrote: >> We then probably also do not need the code directly above anymore added by [JDK-8275326](https://bugs.openjdk.org/browse/JDK-8275326) which was only required due to this eager phi transformation. Could you find out why this eager transformation was added in the first place? > > Thanks for the comments! I have now cleaned up according to your suggestions and checked that it still works (tests and code inspection). I also investigated why the eager transformation was added in the first place, but was unable to find much information. It is from initial load, and looking back even further in the pre-initial-load history, I see that the code was introduced back in 2000 already. The commit message is not really helpful, unfortunately. My best guess is that eagerly transforming the Phis may perhaps result in a preferable ordering of idealizations that introduces less (temporary) nodes. The extremas for "target" and "target-old" gives an indication of this (see the plots in the PR description), but in practice it has no significant observable effect. Thanks for digging deeper! I almost expected that the history is probably lost about this decision. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037187066 From dlunden at openjdk.org Thu Apr 10 12:10:01 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 12:10:01 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v2] In-Reply-To: <0ZCeg1eGFhaFDQ9J50vmYKChV-Hd4ddAu3YCkXpHW70=.a1a40f38-b8c7-4c95-acaa-6e2d2a12730a@github.com> References: <0ZCeg1eGFhaFDQ9J50vmYKChV-Hd4ddAu3YCkXpHW70=.a1a40f38-b8c7-4c95-acaa-6e2d2a12730a@github.com> Message-ID: <3hCLpwSfnd5xIHrNq49a6Qi0DVnUJbsPiyj0y8JiXKE=.67e241e0-88e1-412b-b218-39d95678fc8b@github.com> On Thu, 10 Apr 2025 11:56:53 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up code related to eager transformation and rename itergvn to igvn > > src/hotspot/share/opto/phaseX.cpp line 1037: > >> 1035: // Pull from worklist and transform the node. If the node has changed, >> 1036: // update edge info and put uses on worklist. >> 1037: while(_worklist.size()) { > > I guess we could fix that here as well to be explicit: > Suggestion: > > while (_worklist.size() > 0) { Sure, I committed the suggestion! > test/hotspot/jtreg/compiler/igvn/TestSplitPhiThroughMergeMem.java line 42: > >> 40: */ >> 41: >> 42: package compiler.itergvn; > > You should also update the package name accordingly > > Suggestion: > > * @run main/othervm -Xbatch > * -XX:CompileCommand=CompileOnly,compiler.igvn.TestSplitPhiThroughMergeMem::test > * compiler.igvn.TestSplitPhiThroughMergeMem > * @run main compiler.igvn.TestSplitPhiThroughMergeMem > */ > > package compiler.igvn; Oops, thanks Christian... committed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037190042 PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037190758 From chagedorn at openjdk.org Thu Apr 10 12:26:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 12:26:50 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 11:19:11 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - ir-framework: use new before/after loop opts phases > - Merge branch 'master' into JDK-8346552-predicate-cloning > - Add IR test for predicate cloning > - ir-framework: make the parse predicate node regex more robust > - ir-framework: add auto vectorization check node > - ir-framework: add opaque template assertion predicate node Thanks for working on that! A few some suggestions. test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 43: > 41: > 42: public static void main(String[] strArr) { > 43: TestFramework.runWithFlags("-Xcomp"); Do you really need `-Xcomp` for all compilations? If you only need to have it for the test method, you can add `@Warmup(0)`. In case you would have multiple test methods, you can also use `TestFramework.setDefaultWarmup()`. test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 64: > 62: @Test > 63: // Check that loop unswitching the number of parse predicates inside the unswitched > 64: // loop have doubled. Suggestion: // Check that Loop Unswitching doubled the number of Parse Predicates: We have them at the true- and false-path-loop. Note that the Loop Limit Check Parse Predicate is not cloned when we already have a counted loop. While writing this suggestion, I think it would also be good to have the following tests: - A test where we unswitch a `LoopNode` (i.e. non-counted) to check that the Loop Limit Check Parse Predicate is also cloned to both unswitched loop versions. - Since you already added some verification for Assertion Predicates as well, I suggest to go a step further. We could add a test where we first apply Loop Predication and then Loop Unswitching to check if the number of Template Assertion Predicates also doubled. Here we need to be careful that we don't miscount. We only mark the old Template Assertion Predicates useless in Loop Unswitching and then clean them up in IGVN. So, we would need to check before loop opts phase `n`: `x` Template Assertion Predicates and then before loop opts phase `n + 1`: `2*x` Template Assertion Predicates. You could then update the issue/PR title to: C2: Add IR tests to check that predicate cloning in Loop Unswitching works as expected. ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24479#pullrequestreview-2756445141 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2037222466 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2037202818 From chagedorn at openjdk.org Thu Apr 10 12:31:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 12:31:36 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> Message-ID: <9Bw-QfApEOEKaZrOt0jNwuq462SzmyICdl7MDTPQZ7o=.a608376f-09ff-4909-acb3-4486a1ffd8a9@github.com> On Thu, 10 Apr 2025 12:10:01 GMT, Daniel Lund?n wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn That looks good to me, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24325#pullrequestreview-2756491428 From chagedorn at openjdk.org Thu Apr 10 12:31:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 12:31:37 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 11:41:11 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/TEST.groups line 187: >> >>> 185: compiler/interpreter/ \ >>> 186: compiler/jvmci/ \ >>> 187: compiler/itergvn/ \ >> >> I suggest to use the short name `igvn` >> >> On a separate note, I think we should go through all our test folders in `jtreg/compiler` and check if we should add more folders to tier1. For example, `splitif` or `predicates` tests are currently not executed in tier1 but they probably should. > > Thanks, fixed! Do you want to create an RFE for adding the additional folders to tier1 or should I create one? I can create one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037228715 From chagedorn at openjdk.org Thu Apr 10 12:31:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 12:31:37 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 12:27:20 GMT, Christian Hagedorn wrote: >> Thanks, fixed! Do you want to create an RFE for adding the additional folders to tier1 or should I create one? > > I can create one. Here it is: [JDK-8354284](https://bugs.openjdk.org/browse/JDK-8354284) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2037231890 From qamai at openjdk.org Thu Apr 10 12:32:35 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Apr 2025 12:32:35 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 10 Apr 2025 11:39:36 GMT, Roland Westrelin wrote: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... It would be great if we have union memory slices for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2792582150 From thartmann at openjdk.org Thu Apr 10 13:08:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Apr 2025 13:08:41 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v3] In-Reply-To: References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: <230msiVVwwhkOo3-9OuGZfsBCzrUg2004bgyBm40ocY=.31b08292-4bee-4443-9e6f-e2ba967954c5@github.com> On Thu, 10 Apr 2025 10:55:51 GMT, Manuel H?ssig wrote: >> Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). >> >> This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). >> >> I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add issue number to comment > > Co-authored-by: Christian Hagedorn Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24565#pullrequestreview-2756669890 From chagedorn at openjdk.org Thu Apr 10 13:20:31 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 13:20:31 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v3] In-Reply-To: References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: On Thu, 10 Apr 2025 10:55:51 GMT, Manuel H?ssig wrote: >> Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). >> >> This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). >> >> I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add issue number to comment > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24565#pullrequestreview-2756739803 From enikitin at openjdk.org Thu Apr 10 13:21:42 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 10 Apr 2025 13:21:42 GMT Subject: RFR: 8354255: [jittester] Remove TempDir debug output Message-ID: JITTester's TempDir prints debug information about creation and deletion of a temporary folder, like this: DBG: Temp folder created: '/tmp/java_tests8412639693749199985' DBG: Temp folder deleted: '/tmp/java_tests8412639693749199985' As jittester is a library, TempDir can be used in other tools. Debug outputs mess up logs, confuse output comparison tools, etc. And do not give any valuable information (as temp folder with its contents is deleted after VM shutdown). This PR removes the debug outputs. ------------- Commit messages: - 8354255: [jittester] Remove TempDir debug output Changes: https://git.openjdk.org/jdk/pull/24573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24573&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354255 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24573/head:pull/24573 PR: https://git.openjdk.org/jdk/pull/24573 From chagedorn at openjdk.org Thu Apr 10 13:56:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Apr 2025 13:56:41 GMT Subject: RFR: 8354255: [jittester] Remove TempDir debug output In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 13:17:26 GMT, Evgeny Nikitin wrote: > JITTester's TempDir prints debug information about creation and deletion of a temporary folder, like this: > > DBG: Temp folder created: '/tmp/java_tests8412639693749199985' > DBG: Temp folder deleted: '/tmp/java_tests8412639693749199985' > > As jittester is a library, TempDir can be used in other tools. Debug outputs mess up logs, confuse output comparison tools, etc. And do not give any valuable information (as temp folder with its contents is deleted after VM shutdown). > > This PR removes the debug outputs. That's reasonable. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24573#pullrequestreview-2756936874 From duke at openjdk.org Thu Apr 10 14:21:40 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 14:21:40 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 12:23:43 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - ir-framework: use new before/after loop opts phases >> - Merge branch 'master' into JDK-8346552-predicate-cloning >> - Add IR test for predicate cloning >> - ir-framework: make the parse predicate node regex more robust >> - ir-framework: add auto vectorization check node >> - ir-framework: add opaque template assertion predicate node > > test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 43: > >> 41: >> 42: public static void main(String[] strArr) { >> 43: TestFramework.runWithFlags("-Xcomp"); > > Do you really need `-Xcomp` for all compilations? If you only need to have it for the test method, you can add `@Warmup(0)`. In case you would have multiple test methods, you can also use `TestFramework.setDefaultWarmup()`. Indeed, `@Warmup(0)` does the trick. Fixed it in f2dd35018cf. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2037541718 From kxu at openjdk.org Thu Apr 10 14:23:50 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 14:23:50 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v10] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: reviewer suggested changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/d5c6013d..c840e490 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=08-09 Stats: 57 lines in 3 files changed: 40 ins; 10 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From duke at openjdk.org Thu Apr 10 14:25:21 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 14:25:21 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected [v3] In-Reply-To: References: Message-ID: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - Add suggested comment Co-authored-by: Christian Hagedorn - Remove -Xcomp and replace with Warmup(0) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24479/files - new: https://git.openjdk.org/jdk/pull/24479/files/fc3d5d11..0a4d89ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=01-02 Stats: 6 lines in 1 file changed: 2 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From duke at openjdk.org Thu Apr 10 14:57:37 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 10 Apr 2025 14:57:37 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Parse Predicate cloning in Loop Unswitching works as expected [v2] In-Reply-To: References: Message-ID: <2Ruy8sL0O4Pw73q06ciylmnZt510nGejeD2zK2Kn66k=.dda8b16b-6622-4c7a-8cdf-001763e96b6a@github.com> On Thu, 10 Apr 2025 12:12:58 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - ir-framework: use new before/after loop opts phases >> - Merge branch 'master' into JDK-8346552-predicate-cloning >> - Add IR test for predicate cloning >> - ir-framework: make the parse predicate node regex more robust >> - ir-framework: add auto vectorization check node >> - ir-framework: add opaque template assertion predicate node > > test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 64: > >> 62: @Test >> 63: // Check that loop unswitching the number of parse predicates inside the unswitched >> 64: // loop have doubled. > > Suggestion: > > // Check that Loop Unswitching doubled the number of Parse Predicates: We have them at the true- and false-path-loop. Note that the Loop Limit Check Parse Predicate is not cloned when we already have a counted loop. > > > While writing this suggestion, I think it would also be good to have the following tests: > - A test where we unswitch a `LoopNode` (i.e. non-counted) to check that the Loop Limit Check Parse Predicate is also cloned to both unswitched loop versions. > - Since you already added some verification for Assertion Predicates as well, I suggest to go a step further. We could add a test where we first apply Loop Predication and then Loop Unswitching to check if the number of Template Assertion Predicates also doubled. Here we need to be careful that we don't miscount. We only mark the old Template Assertion Predicates useless in Loop Unswitching and then clean them up in IGVN. So, we would need to check before loop opts phase `n`: `x` Template Assertion Predicates and then before loop opts phase `n + 1`: `2*x` Template Assertion Predicates. > > You could then update the issue/PR title to: > C2: Add IR tests to check that predicate cloning in Loop Unswitching works as expected. I committed your suggestion in 0a4d89ff57c. However, in this test the unswitched is a counted loop and yet the loop limit checks are cloned. I think this test has already found a bug? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2037622175 From roland at openjdk.org Thu Apr 10 15:23:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 10 Apr 2025 15:23:18 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs Message-ID: This is a variant of 8332827. In 8332827, an array access becomes dependent on a range check `CastII` for another array access. When, after loop opts are over, that RC `CastII` was removed, the array access could float and an out of bound access happened. With the fix for 8332827, RC `CastII`s are no longer removed. With this one what happens is that some transformations applied after loop opts are over widen the type of the RC `CastII`. As a result, the type of the RC `CastII` is no longer narrower than that of its input, the `CastII` is removed and the dependency is lost. There are 2 transformations that cause this to happen: - after loop opts are over, the type of the `CastII` nodes are widen so nodes that have the same inputs but a slightly different type can common. - When pushing a `CastII` through an `Add`, if of the type both inputs of the `Add`s are non constant, then we end up widening the type (the resulting `Add` has a type that's wider than that of the initial `CastII`). There are already 3 types of `Cast` nodes depending on the optimizations that are allowed. Either the `Cast` is floating (`depends_only_test()` returns `true`) or pinned. Either the `Cast` can be removed if it no longer narrows the type of its input or not. We already have variants of the `CastII`: - if the Cast can float and be removed when it doesn't narrow the type of its input. - if the Cast is pinned and be removed when it doesn't narrow the type of its input. - if the Cast is pinned and can't be removed when it doesn't narrow the type of its input. What we need here, I think, is the 4th combination: - if the Cast can float and can't be removed when it doesn't narrow the type of its input. Anyway, things are becoming confusing with all these different variants named in ways that don't always help figure out what constraints one of them operate under. So I refactored this and that's the biggest part of this change. The fix consists in marking `Cast` nodes when their type is widen in a way that prevents them from being optimized out. Tobias ran performance testing with a slightly different version of this change and there was no regression. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354282 Stats: 275 lines in 4 files changed: 200 ins; 24 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From kxu at openjdk.org Thu Apr 10 15:30:24 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 15:30:24 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: reviewer suggested changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/3db745d2..b72e7714 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=00-01 Stats: 29 lines in 2 files changed: 0 ins; 2 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From kxu at openjdk.org Thu Apr 10 15:30:28 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 15:30:28 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> References: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> Message-ID: On Thu, 10 Apr 2025 07:03:20 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> reviewer suggested changes > > src/hotspot/share/opto/loopnode.cpp line 313: > >> 311: // clone the inner loop etc. No optimizations need to change the outer >> 312: // strip mined loop as it is only a skeleton. >> 313: IdealLoopTree* PhaseIdealLoop::create_outer_strip_mined_loop(Node *init_control, > > Generally, for the touched code, you can also fix the `*` placement to be at the type. > Suggestion: > > IdealLoopTree* PhaseIdealLoop::create_outer_strip_mined_loop(Node* init_control, Good point. I tried to fix them as I go but missed this one. > src/hotspot/share/opto/loopnode.cpp line 367: > >> 365: } >> 366: >> 367: Node* PhaseIdealLoop::loop_exit_control(const Node* x, const IdealLoopTree* loop) const { > > There are quite some places where we use `x` to denote a `loop_head`. Maybe you can also fix that with this patch. I would prefer a simple `head` for function named `.*loop.*`. Updated all relevant functions I touched. > src/hotspot/share/opto/loopnode.cpp line 1465: > >> 1463: assert(back_control != nullptr, "no back control"); >> 1464: >> 1465: LoopExitTest exit_test = loop_exit_test(back_control, loop); > > Just an idea: Would it make sense to have a single `LoopStructure` (or another name) class that contains all the information stored now with separate structs `LoopExitTest`, `LoopIvIncr` and `LoopIvStride`? Then we could offer query methods for validation like `is_valid()`, `is_stride_valid()` etc., or check for existence like `has_stride()`, or access specific nodes like `stride()`, `incr()` etc. Let me see what I can do! :) > src/hotspot/share/opto/loopnode.hpp line 1310: > >> 1308: Node*& entry_control, Node*& iffalse); >> 1309: >> 1310: class CountedLoopConverter { > > Probably quite a subjective matter, but what about just naming it `CountedLoop`? Then you have `counted_loop.convert()`. I disagree: I don't want to confuse it with `CountedLoopNode` or making it looks like extending from `IdealLoopTree`. Besides it wouldn't not necessarily guarantee a counted loop until confirmed by `is_counted_loop()` > src/hotspot/share/opto/loopnode.hpp line 1311: > >> 1309: >> 1310: class CountedLoopConverter { >> 1311: PhaseIdealLoop* _phase; > > I suggest to add `const` whenever you can. For example, here you will probably not change the `_phase` pointer anymore: > Suggestion: > > PhaseIdealLoop* const _phase; That was my initial thought, too; however, `PhaseIdealLoop::insert_loop_limit_check_predicate()`, `::lazy_replace()`, `::set_subtree_ctrl()` and many other mutations makes this impossible. > src/hotspot/share/opto/loopnode.hpp line 1331: > >> 1329: bool _includes_limit; >> 1330: BoolTest::mask _mask; >> 1331: Node* _increment; > > Here you have `(_phi_)incr` and `increment`. I suggest to make the naming consistent. I personally prefer full names whenever we can if it's not a universal abbreviation like `cmp` or `iv`. But what we count as well-known abbreviation is debatable and also depends on personal taste. Updated `s/_phi_incr/_phi_increment` > src/hotspot/share/opto/loopnode.hpp line 1336: > >> 1334: Node* _sfpt; >> 1335: jlong _final_correction; >> 1336: Node* _trunc1; > > `trunc1` is a little hard to understand. Can we find a better name? Droped `Node* _trunc1` in favour of `TypeInteger* _increment_truncation_type`. Added a note in `should_stress_long_counted_loop()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037682520 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037682664 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037682831 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037683322 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037683588 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037683863 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2037684011 From mcimadamore at openjdk.org Thu Apr 10 15:55:35 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 10 Apr 2025 15:55:35 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v6] In-Reply-To: <8GSYKwHUhVFL-XjxKB9SzUyiEXVcGWnxoEr6YTnZfPE=.080d6c74-505e-41c0-84ad-2fabe5356365@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> <8GSYKwHUhVFL-XjxKB9SzUyiEXVcGWnxoEr6YTnZfPE=.080d6c74-505e-41c0-84ad-2fabe5356365@github.com> Message-ID: On Tue, 8 Apr 2025 12:13:25 GMT, Per Minborg wrote: > Baseline: > > ``` > Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units > FFMVarHandleInlineTest.t0_reference 2048 1024 thrpt 25 1552613.262 ? 14295.035 ops/s > FFMVarHandleInlineTest.t1_level8 2048 1024 thrpt 25 1558465.228 ? 8458.874 ops/s > FFMVarHandleInlineTest.t2_level9 2048 1024 thrpt 25 1542009.100 ? 10240.173 ops/s > FFMVarHandleInlineTest.t3_level10 2048 1024 thrpt 25 1553407.503 ? 10834.133 ops/s > FFMVarHandleInlineTest.t4_level11 2048 1024 thrpt 25 87666.558 ? 765.848 ops/s. <-- We hit the inline limit here > ``` > > Patch without `Object` changes: > > ``` > Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units > FFMVarHandleInlineTest.t_level11 2048 1024 thrpt 6 149221.636 ? 1928.957 ops/s > FFMVarHandleInlineTest.t_level12 2048 1024 thrpt 6 74478.268 ? 1093.429 ops/s > FFMVarHandleInlineTest.t_level13 2048 1024 thrpt 6 74490.972 ? 623.857 ops/s > FFMVarHandleInlineTest.t_level14 2048 1024 thrpt 6 69637.549 ? 2497.253 ops/s > FFMVarHandleInlineTest.t_level15 2048 1024 thrpt 6 3495.106 ? 87.465 ops/s > ``` > > Patch with `Object` changes: > > ``` > Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units > FFMVarHandleInlineTest.t_level11 2048 1024 thrpt 6 1545991.924 ? 21206.450 ops/s > FFMVarHandleInlineTest.t_level12 2048 1024 thrpt 6 1542234.193 ? 18002.511 ops/s > FFMVarHandleInlineTest.t_level13 2048 1024 thrpt 6 1542601.822 ? 15041.864 ops/s > FFMVarHandleInlineTest.t_level14 2048 1024 thrpt 6 179053.325 ? 2496.002 ops/s > FFMVarHandleInlineTest.t_level15 2048 1024 thrpt 6 3433.861 ? 165.847 ops/s > ``` am I reading this the right way around? Baseline is faster than the patched version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23460#issuecomment-2794329053 From mcimadamore at openjdk.org Thu Apr 10 15:55:37 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 10 Apr 2025 15:55:37 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v7] In-Reply-To: References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> Message-ID: On Tue, 8 Apr 2025 12:24:41 GMT, Per Minborg wrote: >> This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. >> >> There are also some changes in other classes which, if implemented, can take us three additional levels of inlining. I drew a line there. There is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. >> >> Updating the `j.l.Object` constructor is crucial for the higher depths. >> >> Tested and passed tier1-3 > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Reintroduce Object changes src/java.base/share/classes/java/lang/Module.java line 180: > 178: * @jls 7.7.5 Unnamed Modules > 179: */ > 180: @ForceInline Doesn't the `ensureNativeAccess` code only depend on this one? Also, I'm having an hard time thinking that C2 can't inline this simple method? Isn't this an "accessor" ? src/java.base/share/classes/jdk/internal/foreign/MemorySessionImpl.java line 213: > 211: * a confined session and this method is called outside the owner thread. > 212: */ > 213: @ForceInline I presume this is added because it's called by the reinterpret implementation? src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java line 100: > 98: > 99: @Override > 100: @ForceInline I'm not sure this is needed -- this is an "accessor" method - C2 typically inline those. But, do we depend on `maxAlignMask` for reinterpret? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23460#discussion_r2037723817 PR Review Comment: https://git.openjdk.org/jdk/pull/23460#discussion_r2037726741 PR Review Comment: https://git.openjdk.org/jdk/pull/23460#discussion_r2037730266 From roland at openjdk.org Thu Apr 10 15:56:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 10 Apr 2025 15:56:25 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v2] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: TestIterativeEA fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/71fd8315..a4031f3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From mcimadamore at openjdk.org Thu Apr 10 15:58:29 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 10 Apr 2025 15:58:29 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v6] In-Reply-To: References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> <6wb-C25gjMYaQZXn2BFAQ2okygXAiHNxciy-zmDsy9M=.7555f6bb-2112-42d8-87ef-4ab63fab6d62@github.com> <8GSYKwHUhVFL-XjxKB9SzUyiEXVcGWnxoEr6YTnZfPE=.080d6c74-505e-41c0-84ad-2fabe5356365@github.com> Message-ID: <-Fi-Auy3YpXOAzmBgoQXZzK5kXpPGKXYqnEv1h0Cv6k=.2d741644-aea8-463a-85b5-b1652f48f984@github.com> On Thu, 10 Apr 2025 15:52:38 GMT, Maurizio Cimadamore wrote: > > Baseline: > > ``` > > Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units > > FFMVarHandleInlineTest.t0_reference 2048 1024 thrpt 25 1552613.262 ? 14295.035 ops/s > > FFMVarHandleInlineTest.t1_level8 2048 1024 thrpt 25 1558465.228 ? 8458.874 ops/s > > FFMVarHandleInlineTest.t2_level9 2048 1024 thrpt 25 1542009.100 ? 10240.173 ops/s > > FFMVarHandleInlineTest.t3_level10 2048 1024 thrpt 25 1553407.503 ? 10834.133 ops/s > > FFMVarHandleInlineTest.t4_level11 2048 1024 thrpt 25 87666.558 ? 765.848 ops/s. <-- We hit the inline limit here > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Patch without `Object` changes: > > ``` > > Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units > > FFMVarHandleInlineTest.t_level11 2048 1024 thrpt 6 149221.636 ? 1928.957 ops/s > > FFMVarHandleInlineTest.t_level12 2048 1024 thrpt 6 74478.268 ? 1093.429 ops/s > > FFMVarHandleInlineTest.t_level13 2048 1024 thrpt 6 74490.972 ? 623.857 ops/s > > FFMVarHandleInlineTest.t_level14 2048 1024 thrpt 6 69637.549 ? 2497.253 ops/s > > FFMVarHandleInlineTest.t_level15 2048 1024 thrpt 6 3495.106 ? 87.465 ops/s > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Patch with `Object` changes: > > ``` > > Benchmark (offsetCount) (segmentSize) Mode Cnt Score Error Units > > FFMVarHandleInlineTest.t_level11 2048 1024 thrpt 6 1545991.924 ? 21206.450 ops/s > > FFMVarHandleInlineTest.t_level12 2048 1024 thrpt 6 1542234.193 ? 18002.511 ops/s > > FFMVarHandleInlineTest.t_level13 2048 1024 thrpt 6 1542601.822 ? 15041.864 ops/s > > FFMVarHandleInlineTest.t_level14 2048 1024 thrpt 6 179053.325 ? 2496.002 ops/s > > FFMVarHandleInlineTest.t_level15 2048 1024 thrpt 6 3433.861 ? 165.847 ops/s > > ``` > > am I reading this the right way around? Baseline is faster than the patched version? Ah ok - you gain 3-4 more levels. Again... I'm really not sure about how reliable this all is. I'd suggest for somebody like @iwanowww to validate this -- while JMH gives some better results here, I'm skeptical that real world usage is going to see any difference here, as the benchmarking conditions seem a little too on the artificial side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23460#issuecomment-2794338281 From simonis at openjdk.org Thu Apr 10 16:13:30 2025 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 10 Apr 2025 16:13:30 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. This sounds a little scary. Is there a comprehensive list of available ARM erratas that require special workarounds in the native compiler and the HotSpot's own code generation? @stooart-mon? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2794386894 From kvn at openjdk.org Thu Apr 10 16:58:33 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Apr 2025 16:58:33 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> Message-ID: On Thu, 10 Apr 2025 12:10:01 GMT, Daniel Lund?n wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Working back on EA I hit issue when we infinitely generate Phi nodes in loops when splitting it for allocation instance memory slice: [memnode.cpp#L1310](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L1310) There are other places where we try to avoid infinite Phi nodes generation. What happens in your case? Did this happens in loop? Is it also infinite generation or just excessive number of nodes and it passed after this fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2794542033 From smonteith at openjdk.org Thu Apr 10 17:05:39 2025 From: smonteith at openjdk.org (Stuart Monteith) Date: Thu, 10 Apr 2025 17:05:39 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 16:10:26 GMT, Volker Simonis wrote: >> The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. >> ? >> Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. >> ? >> This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. >> >> The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. > > This sounds a little scary. Is there a comprehensive list of available ARM erratas that require special workarounds in the native compiler and the HotSpot's own code generation? @stooart-mon? To not answer your question @simonis, errata are published for each processor, with details on how each revision of the CPU is affected. For example. the A53: https://developer.arm.com/documentation/epm048406/2100/?lang=en I'm not aware or a curated list of CPU errata and mitigations for compilers, so I'll get back to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2794561452 From dlunden at openjdk.org Thu Apr 10 18:03:45 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 18:03:45 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> Message-ID: <838Z9RFGYjPV6XgCSwn9Rj4oPazREQWSxbdjDQ7UWAY=.a7a6f496-b45d-46dd-bb16-22faf2d49c76@github.com> On Thu, 10 Apr 2025 16:55:24 GMT, Vladimir Kozlov wrote: > What happens in your case? Did this happens in loop? Is it also infinite generation or just excessive number of nodes and it passed after this fix? It terminates, but generates an excessive number of nodes (but only locally, within IGVN runs). I added a check to ensure termination as part of https://github.com/openjdk/jdk/pull/23691, where we enabled the idealization in question in many more cases compared to before (needed for correctness). The problem in this issue is that we now add way too many nodes during a single IGVN iteration, and therefore fail to trigger the node check bailout in `PhaseIterIGVN::optimize`. Instead, we trigger an assert during node creation when we surpass `MaxNodeLimit` nodes. After this fix, we cleanly bail out instead of hitting the assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2794713625 From kxu at openjdk.org Thu Apr 10 18:26:37 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 18:26:37 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v11] In-Reply-To: References: Message-ID: <0E27yBuVFtANYo0L-5eHyrsDR4SFsT1Rg-UgRZsR24s=.f144219e-cf9b-4400-8fac-1a0eb890ee98@github.com> > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: include simple addition as a case of power of two additions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/c840e490..4ac08cb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=09-10 Stats: 11 lines in 2 files changed: 3 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Thu Apr 10 18:53:51 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 18:53:51 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v12] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add assertion for MulLNode too ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/4ac08cb9..eefe3a35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=10-11 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kvn at openjdk.org Thu Apr 10 18:55:34 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Apr 2025 18:55:34 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: <838Z9RFGYjPV6XgCSwn9Rj4oPazREQWSxbdjDQ7UWAY=.a7a6f496-b45d-46dd-bb16-22faf2d49c76@github.com> References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> <838Z9RFGYjPV6XgCSwn9Rj4oPazREQWSxbdjDQ7UWAY=.a7a6f496-b45d-46dd-bb16-22faf2d49c76@github.com> Message-ID: On Thu, 10 Apr 2025 18:00:50 GMT, Daniel Lund?n wrote: > After this fix, we cleanly bail out instead of hitting the assert. Is bailout due to infinite nodes generation (which is bug and can be fixed separately) or simply excessive numbers of node and by increasing bailout number limit the compilation will pass? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2794850964 From kxu at openjdk.org Thu Apr 10 19:05:34 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 19:05:34 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:08:11 GMT, Emanuel Peter wrote: >> Ah interesting. It could be worth adding a comment for that here then! > > That was in fact a large part of my initial hesitation with this PR. Added comments on `(a*b) + (a*c)` and `AddNode::IdealIL` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2038113619 From kxu at openjdk.org Thu Apr 10 19:05:37 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 19:05:37 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: <2YsQyyHJuiGrgnTRKYADjQhRa5qIaDvCCjLd4kjfdeI=.0134b90b-fa18-4c5b-afb3-f7f4e10d6411@github.com> References: <2YsQyyHJuiGrgnTRKYADjQhRa5qIaDvCCjLd4kjfdeI=.0134b90b-fa18-4c5b-afb3-f7f4e10d6411@github.com> Message-ID: <2jG14TNV1gWDGPCDov-X8UMRRqKRJ7uJ_FpmQB5FUEE=.326a22e4-74cd-4c0a-b63e-0c6cf46580ff@github.com> On Thu, 3 Apr 2025 05:06:51 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/addnode.cpp line 509: >> >>> 507: if (rhs.valid && rhs.variable == n->in(1)) { >>> 508: return Multiplication{true, rhs.variable, rhs.multiplier + 1}; >>> 509: } >> >> Hmm, it seems these are patterns that you did not promise you would cover in the description above. >> It makes it a little difficult to keep the overview... > > Just a drive-by comment what might help: Name the cases you cover in the description with `(1)`, (2)` etc. and add the numbers as comments in the code where you cover the patterns. This would support the mapping from description to implementation. I updated and expanded the comment. They are clearer now. I also added a subcase for `a + a` so `assert()` in `MulNode::Ideal()` can call directly into `find_power_of_two_addition_pattern()` > Name the cases you cover in the description [..] That's a great suggestion! Thank you. I'll start using this technique from now on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2038113317 From kxu at openjdk.org Thu Apr 10 19:05:36 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 19:05:36 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:00:14 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix ((x< > src/hotspot/share/opto/addnode.cpp line 415: > >> 413: if (find_power_of_two_addition_pattern(this, bt).valid) { >> 414: return nullptr; >> 415: } > > Hmm. So somewhere we would have generated that pattern, probably in MulNode. Can you add a verification there, to check that we are only generating patterns that `find_power_of_two_addition_pattern` recognizes? That would make sure that we keep the code here and there in sync. Added `assert()` to `Mul[IL]Node::Ideal()` > src/hotspot/share/opto/addnode.cpp line 428: > >> 426: ((mul = find_simple_multiplication_pattern(in1, bt)).valid && mul.variable == in2) || >> 427: ((mul = find_power_of_two_addition_pattern(in1, bt)).valid && mul.variable == in2) >> 428: ) { > > I find this quite difficult to read. And it looks repetitive too. Maybe you can refactor it? > Also, it would be nice to have comments with the patterns here, to see which one covers what case, so that we have a nice overview. Refactored. Added comments with example. It's still some-what repetitive and definitely much clearer. https://github.com/openjdk/jdk/blob/eefe3a35effc469f3527d4825c9cc4f2a72742dc/src/hotspot/share/opto/addnode.cpp#L421-L444 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2038113521 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2038113480 From kxu at openjdk.org Thu Apr 10 19:13:19 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 19:13:19 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v13] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: - update comments - use java_add to avoid cpp overflow UB ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/eefe3a35..c0172410 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=11-12 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Thu Apr 10 19:13:21 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 19:13:21 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:15:31 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix ((x< > src/hotspot/share/opto/addnode.cpp line 431: > >> 429: Node* con = (bt == T_INT) >> 430: ? (Node*) phase->intcon((jint) (mul.multiplier + 1)) // intentional type narrowing to allow overflow at max_jint >> 431: : (Node*) phase->longcon((mul.multiplier + 1)); > > I think just to be safe, you should use `java_add` to have correct overflow semantics. You are using `jlong` for `multiplier`, which is a signed integer type, and in C++ overflow is undefined behavior as far as I know, let's avoid that ;) > > Actually, do you have a test where the multiplier overflows here? Good point regarding UB. Updated to use `java_add`. There are several test cases named `.*Overflow()` in `TestSerialAdditions`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2038121769 From vlivanov at openjdk.org Thu Apr 10 19:39:32 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Apr 2025 19:39:32 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v3] In-Reply-To: References: Message-ID: <5oB7-inLfd5qJdCjXJLSXcmU5mjk3RUjPqEvUhCq2Ek=.0d0fe486-9711-4bf1-b9ba-f0a8d5047358@github.com> On Thu, 10 Apr 2025 09:04:39 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > introduce common ReverseBytesNode Overall, looks good. Submitted it for testing. src/hotspot/share/opto/subnode.cpp line 2024: > 2022: } > 2023: > 2024: const Type* reverse_bytes(int opcode, const Type* con) { Please, declare it as static. ------------- PR Review: https://git.openjdk.org/jdk/pull/24382#pullrequestreview-2758148334 PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2038178535 From dlunden at openjdk.org Thu Apr 10 19:40:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 10 Apr 2025 19:40:28 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> <838Z9RFGYjPV6XgCSwn9Rj4oPazREQWSxbdjDQ7UWAY=.a7a6f496-b45d-46dd-bb16-22faf2d49c76@github.com> Message-ID: <0trqY7nAhnKoKXHS-sEqJr_kCO9rSLEkPDP0fmN5Ni4=.c38c35b8-406a-41eb-bd55-415ded0b1b18@github.com> On Thu, 10 Apr 2025 18:52:21 GMT, Vladimir Kozlov wrote: > Is bailout due to infinite nodes generation (which is bug and can be fixed separately) or simply excessive numbers of node and by increasing bailout number limit the compilation will pass? Compilation does complete with increased `MaxNodeLimit`. The node count is "just" (temporarily) excessive. Have a look at the plot from the PR description below, illustrating the live node count (orange) and IGVN worklist size (blue) throughout the specific problematic IGVN run (with `MaxNodeLimit` increased from the default 80 000 to allow the compilation to complete). The live node count peaks just beyond 80 000 nodes. After completing the IGVN run, the live node count looks normal (see the sharp decrease in the plots just at the end). - Before the fix for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) ("baseline") - After the fix for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) ("target-old") - After the fix in this PR ("target") ![igvn-worklist-nodes-count-plot](https://github.com/user-attachments/assets/cb56a85b-5612-4cb5-b49f-cf3a6a07a1f7) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2794961261 From kxu at openjdk.org Thu Apr 10 20:18:20 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 10 Apr 2025 20:18:20 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v14] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix typo: lhs->rhs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/c0172410..e9b42dcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=12-13 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From dlong at openjdk.org Thu Apr 10 21:01:52 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Apr 2025 21:01:52 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v14] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 20:18:20 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix typo: lhs->rhs src/hotspot/share/utilities/globalDefinitions.hpp line 1278: > 1276: JAVA_INTEGER_SHIFT_BASIC_TYPE(java_shift_right) > 1277: JAVA_INTEGER_SHIFT_BASIC_TYPE(java_shift_right_unsigned) > 1278: Couldn't this be written as a template, so we could use java_shift_left(x,y) or java_shift_left(x.y)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2038329474 From bulasevich at openjdk.org Thu Apr 10 21:00:58 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 10 Apr 2025 21:00:58 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: <9ZH2aw7JBURE5LhQ5ldPYi_5DoeuU6q2sw2y_y2TuaI=.a9b056c6-e33a-4882-8556-7653e7c5ecf9@github.com> On Wed, 9 Apr 2025 11:32:16 GMT, Daniel Lund?n wrote: > Can you please test these changes on your respective ports? Hi! Thanks for ping. No regressions on ARM32. TestMaxMethodArguments and TestNestedSynchronize pass fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2795152449 From kvn at openjdk.org Thu Apr 10 21:42:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Apr 2025 21:42:35 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v4] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 23:32:05 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Fix windows-aarch64 build failure Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2758486176 From kvn at openjdk.org Thu Apr 10 21:42:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Apr 2025 21:42:37 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v4] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 23:20:05 GMT, Vladimir Ivanov wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 60: >> >>> 58: } >>> 59: >>> 60: public static class X64 { >> >> Should we create `src/jdk.incubator.vector/cpu/` for CPU specific information? As separate refactoring. > > To clarify: are you suggesting to move platform-specific classes into a separate package or platform-specific location? > > It does make sense to separate platform-specific parts into their own classes once amount of code grows over some limit. For now it doesn't look too attractive since amount of code is very small. ok >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 100: >> >>> 98: >>> 99: /** >>> 100: * Naming convention in SVML vector math library. >> >> Does this library has code for all AVX configurations? > > Yes, there are 4 configurations (`-XX:UseAVX=[0..3]`) in total covered by SVML library. Good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2038379482 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2038379716 From sparasa at openjdk.org Thu Apr 10 22:57:39 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Apr 2025 22:57:39 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v2] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Disable demotion for esetzucc and cleanup code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/6a7ca6b2..bd846088 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=00-01 Stats: 25 lines in 2 files changed: 0 ins; 5 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Thu Apr 10 23:03:26 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Apr 2025 23:03:26 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v2] In-Reply-To: <679T11zTC_n620-mbQBrge_YYoGCWhWu8E-8c-ABZxg=.1e347d21-7347-4114-82f7-ca76abacc71d@github.com> References: <679T11zTC_n620-mbQBrge_YYoGCWhWu8E-8c-ABZxg=.1e347d21-7347-4114-82f7-ca76abacc71d@github.com> Message-ID: On Wed, 9 Apr 2025 23:19:51 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Disable demotion for esetzucc and cleanup code > > src/hotspot/cpu/x86/assembler_x86.cpp line 13770: > >> 13768: InstructionAttr *attributes, bool no_flags, bool use_prefixq) { >> 13769: // Demote RegRegReg instructions >> 13770: if (!no_flags && dst_enc == nds_enc) { > > This could be replaced by call to is_demotable(). Please see the updated code refactored to use is_demotable(). > src/hotspot/cpu/x86/assembler_x86.cpp line 13771: > >> 13769: // Demote RegRegReg instructions >> 13770: if (!no_flags && dst_enc == nds_enc) { >> 13771: return use_prefixq? prefixq_and_encode(dst_enc, src_enc) : prefix_and_encode(dst_enc, src_enc); > > Nit pick, need space before ? as below: > use_prefixq ? prefixq_and_encode Please see the space fixed in the updated code. > src/hotspot/cpu/x86/assembler_x86.cpp line 13818: > >> 13816: } >> 13817: >> 13818: bool Assembler::is_demotable(bool no_flags, int dst_enc, int nds_enc, int src_enc) { > > src_enc is not being used in this method so could be removed. Please see the src_enc removed in the updated code. > src/hotspot/cpu/x86/assembler_x86.cpp line 17185: > >> 17183: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 17184: // Encoding Format : eevex_prefix (4 bytes) | opcode_cc | modrm >> 17185: int encode = evex_prefix_and_encode_ndd(0, 0, dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); //TODO: check this > > This should not be demoted. Please see the demotion disabled for esetzucc instruction in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2038525310 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2038525039 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2038524403 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2038524767 From sparasa at openjdk.org Thu Apr 10 23:25:31 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Apr 2025 23:25:31 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v2] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 03:04:56 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Disable demotion for esetzucc and cleanup code > > src/hotspot/cpu/x86/assembler_x86.cpp line 13825: > >> 13823: return (!no_flags && dst_enc == nds_enc); >> 13824: } >> 13825: > > @vamsi-parasa , We are missing a case where dst_enc can be equal to src_enc; in that case, we can still demote EVEX to REX/REX2 encoding, along with a change in primary opcode if needed. > > This will apply to all the commutative operations (ADD/ AND / OR / XOR) Thanks for the proposal. This feature will be enabled in another PR using the JBS entry https://bugs.openjdk.org/browse/JDK-8354348. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2038539820 From duke at openjdk.org Fri Apr 11 06:53:39 2025 From: duke at openjdk.org (duke) Date: Fri, 11 Apr 2025 06:53:39 GMT Subject: RFR: 8354255: [jittester] Remove TempDir debug output In-Reply-To: References: Message-ID: <7WKPMCaG26xk43pNQxmWCl_0oVu03TAsCFh85nyFwfQ=.233b1610-6754-4951-92d7-1a3a3f3999c4@github.com> On Thu, 10 Apr 2025 13:17:26 GMT, Evgeny Nikitin wrote: > JITTester's TempDir prints debug information about creation and deletion of a temporary folder, like this: > > DBG: Temp folder created: '/tmp/java_tests8412639693749199985' > DBG: Temp folder deleted: '/tmp/java_tests8412639693749199985' > > As jittester is a library, TempDir can be used in other tools. Debug outputs mess up logs, confuse output comparison tools, etc. And do not give any valuable information (as temp folder with its contents is deleted after VM shutdown). > > This PR removes the debug outputs. @lepestock Your change (at version ed7140f514fd3d97ab1b9cdbbdda98b05a94e21e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24573#issuecomment-2796009008 From duke at openjdk.org Fri Apr 11 07:07:23 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 11 Apr 2025 07:07:23 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v6] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv match rule and add test Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: fix zvbb mask match rule ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/ec0413af..edb16683 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=04-05 Stats: 12 lines in 1 file changed: 2 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From thartmann at openjdk.org Fri Apr 11 08:17:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Apr 2025 08:17:54 GMT Subject: RFR: 8354255: [jittester] Remove TempDir debug output In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 13:17:26 GMT, Evgeny Nikitin wrote: > JITTester's TempDir prints debug information about creation and deletion of a temporary folder, like this: > > DBG: Temp folder created: '/tmp/java_tests8412639693749199985' > DBG: Temp folder deleted: '/tmp/java_tests8412639693749199985' > > As jittester is a library, TempDir can be used in other tools. Debug outputs mess up logs, confuse output comparison tools, etc. And do not give any valuable information (as temp folder with its contents is deleted after VM shutdown). > > This PR removes the debug outputs. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24573#pullrequestreview-2759518698 From enikitin at openjdk.org Fri Apr 11 08:17:56 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 11 Apr 2025 08:17:56 GMT Subject: Integrated: 8354255: [jittester] Remove TempDir debug output In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 13:17:26 GMT, Evgeny Nikitin wrote: > JITTester's TempDir prints debug information about creation and deletion of a temporary folder, like this: > > DBG: Temp folder created: '/tmp/java_tests8412639693749199985' > DBG: Temp folder deleted: '/tmp/java_tests8412639693749199985' > > As jittester is a library, TempDir can be used in other tools. Debug outputs mess up logs, confuse output comparison tools, etc. And do not give any valuable information (as temp folder with its contents is deleted after VM shutdown). > > This PR removes the debug outputs. This pull request has now been integrated. Changeset: 1fc1cc5d Author: Evgeny Nikitin Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/1fc1cc5da9a38cf936636a72f9b8a4c246ceaab4 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod 8354255: [jittester] Remove TempDir debug output Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24573 From chagedorn at openjdk.org Fri Apr 11 08:18:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Apr 2025 08:18:37 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> Message-ID: On Thu, 10 Apr 2025 15:26:02 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/loopnode.cpp line 313: >> >>> 311: // clone the inner loop etc. No optimizations need to change the outer >>> 312: // strip mined loop as it is only a skeleton. >>> 313: IdealLoopTree* PhaseIdealLoop::create_outer_strip_mined_loop(Node *init_control, >> >> Generally, for the touched code, you can also fix the `*` placement to be at the type. >> Suggestion: >> >> IdealLoopTree* PhaseIdealLoop::create_outer_strip_mined_loop(Node* init_control, > > Good point. I tried to fix them as I go but missed this one. With the `x` -> `head` change, there are some more opportunities :-) >> src/hotspot/share/opto/loopnode.hpp line 1310: >> >>> 1308: Node*& entry_control, Node*& iffalse); >>> 1309: >>> 1310: class CountedLoopConverter { >> >> Probably quite a subjective matter, but what about just naming it `CountedLoop`? Then you have `counted_loop.convert()`. > > I disagree: I don't want to confuse it with `CountedLoopNode` or making it looks like extending from `IdealLoopTree`. Besides it wouldn't not necessarily guarantee a counted loop until confirmed by `is_counted_loop()` What I had in mind was something like that (I'm also fine with `CountedLoopConverter`: CountedLoop counted_loop(...); counted_loop.convert(); return counted_loop.is_valid(); But maybe you can explain in more detail what the follow-up work will be and how you use this class again later? >> src/hotspot/share/opto/loopnode.hpp line 1311: >> >>> 1309: >>> 1310: class CountedLoopConverter { >>> 1311: PhaseIdealLoop* _phase; >> >> I suggest to add `const` whenever you can. For example, here you will probably not change the `_phase` pointer anymore: >> Suggestion: >> >> PhaseIdealLoop* const _phase; > > That was my initial thought, too; however, `PhaseIdealLoop::insert_loop_limit_check_predicate()`, `::lazy_replace()`, `::set_subtree_ctrl()` and many other mutations make this impossible. You might be confusing it with having: const PhaseIdealLoop* _phase where we cannot call any non-const methods. That's indeed not possible. But what I mean was to make the pointer const: PhaseIdealLoop* const _phase; such that you cannot do `_phase = xyz` later. You would probably not do that anyway but it's an easy addition and safety for all fields not being reassigned again. It also helps to see which fields are going to be updated as part of the mutable state and which fields are not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2039041842 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2039041754 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2039041966 From chagedorn at openjdk.org Fri Apr 11 08:24:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Apr 2025 08:24:26 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 15:30:24 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > reviewer suggested changes src/hotspot/share/opto/loopnode.cpp line 1627: > 1625: } > 1626: > 1627: bool PhaseIdealLoop::CountedLoopConverter::is_counted_loop() { `is_counted_loop()` is still huge. When trying to refactor it anyway, it might be a good idea to further split the code up into separate methods for all the different checks. That would increase the readability since we have a lot of different checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2039046541 From chagedorn at openjdk.org Fri Apr 11 08:24:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Apr 2025 08:24:27 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> Message-ID: <505xqF92lH1o1y-nxZ760fTJg0yeJEsxOV5jrbo9ArE=.776257e5-69fd-4c62-87b3-f3f88402c580@github.com> On Fri, 11 Apr 2025 08:15:20 GMT, Christian Hagedorn wrote: >> I disagree: I don't want to confuse it with `CountedLoopNode` or making it looks like extending from `IdealLoopTree`. Besides it wouldn't not necessarily guarantee a counted loop until confirmed by `is_counted_loop()` > > What I had in mind was something like that (I'm also fine with `CountedLoopConverter`: > > CountedLoop counted_loop(...); > counted_loop.convert(); > return counted_loop.is_valid(); > > But maybe you can explain in more detail what the follow-up work will be and how you use this class again later? About the placement of this class. You could probably also move it out of the already huge `PhaseIdealLoop` class to make it a non-inner class, if there is not some strong coupling that you absolutely need. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2039049623 From rcastanedalo at openjdk.org Fri Apr 11 08:29:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 08:29:11 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 21:06:17 GMT, Saranya Natarajan wrote: > Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. > > Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() src/hotspot/share/opto/memnode.hpp line 139: > 137: // The returned type is a property of the value that is loaded/stored and > 138: // not the memory that is accessed. For mismatched memory accesses > 139: // they might differ. For instance, a value of type 'short' may be stoted Suggestion: // they might differ. For instance, a value of type 'short' may be stored ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24427#discussion_r2028844502 From duke at openjdk.org Fri Apr 11 08:29:11 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Fri, 11 Apr 2025 08:29:11 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() Message-ID: Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() ------------- Commit messages: - Merge branch 'master' into JDK-8352620 - removing comment - Merge branch 'master' into JDK-8352620 - 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() Changes: https://git.openjdk.org/jdk/pull/24427/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24427&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352620 Stats: 41 lines in 5 files changed: 4 ins; 1 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/24427.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24427/head:pull/24427 PR: https://git.openjdk.org/jdk/pull/24427 From epeter at openjdk.org Fri Apr 11 09:56:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 11 Apr 2025 09:56:40 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v4] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:27:04 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - merge > - Merge branch 'master' into JDK-8349139 > - other test + review comment > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - fix & test I think this looks reasonable. `test/hotspot/jtreg/compiler/controldependency/TestDivDependentOnMainLoopGuard.java` In this test you could also randomize the `2` I commented on. Because in https://github.com/openjdk/jdk/pull/3190 you had the same test but with a constant `3`. A `public static final int` with a random value in a reasonable range could do the trick. test/hotspot/jtreg/compiler/controldependency/TestDivDependentOnMainLoopGuard.java line 31: > 29: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM -XX:StressSeed=139899009 TestDivDependentOnMainLoopGuard > 30: * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:CompileOnly=TestDivDependentOnMainLoopGuard::* > 31: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM TestDivDependentOnMainLoopGuard Would it make sense to have a run without some of the flags here? ------------- PR Review: https://git.openjdk.org/jdk/pull/23617#pullrequestreview-2759777505 PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2039203732 From epeter at openjdk.org Fri Apr 11 10:02:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 11 Apr 2025 10:02:36 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v4] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:27:04 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - merge > - Merge branch 'master' into JDK-8349139 > - other test + review comment > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - fix & test Just submitted some more testing :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2796440992 From duke at openjdk.org Fri Apr 11 11:24:30 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 11 Apr 2025 11:24:30 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v3] In-Reply-To: References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: <9TV20GCF97BdaWS0EB0NqVqJOIDiYomCwZdc-9YGDZM=.e0736ef6-b40a-45b7-bccb-4b759c48f554@github.com> On Thu, 10 Apr 2025 10:55:51 GMT, Manuel H?ssig wrote: >> Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). >> >> This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). >> >> I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add issue number to comment > > Co-authored-by: Christian Hagedorn Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24565#issuecomment-2796621340 From duke at openjdk.org Fri Apr 11 11:24:30 2025 From: duke at openjdk.org (duke) Date: Fri, 11 Apr 2025 11:24:30 GMT Subject: RFR: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support [v3] In-Reply-To: References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: On Thu, 10 Apr 2025 10:55:51 GMT, Manuel H?ssig wrote: >> Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). >> >> This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). >> >> I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add issue number to comment > > Co-authored-by: Christian Hagedorn @mhaessig Your change (at version 176cb067bda10898bf2187d488964154a1cfacbc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24565#issuecomment-2796622544 From duke at openjdk.org Fri Apr 11 11:32:35 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 11 Apr 2025 11:32:35 GMT Subject: Integrated: 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support In-Reply-To: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> References: <1oQpmiZAIEkZeemgdGsFyYISkrexaQHFNz1vapLwWcQ=.a6408873-1ed3-4766-b7ee-f4f67b1880f4@github.com> Message-ID: On Thu, 10 Apr 2025 08:31:05 GMT, Manuel H?ssig wrote: > Due to insufficient testing on machines supporting FP16 arithmetic in their ISA, I missed that these machines generate two `SUB_FH` nodes and, crucially, an additional `SUB_F` node. We suspect that this comes from some kind of fallback code path ([open issue](https://bugs.openjdk.org/browse/JDK-8353732); also see [discussion in RISC-V PR fixing the same issue](https://github.com/openjdk/jdk/pull/24421#issuecomment-2777755042)). > > This PR fixes this issue for all architectures that support FP16 instructions (that I know of), by only matching `SUB_HF` nodes when the CPU supports FP16. The tests for ARM are currently commented out, due to the support for Float16 still being a work in progress (see PR #23748). > > I tested the fix using software emulation of an x86_64 CPU with the `avx512_fp16` feature. I also ran the [sanity checks](https://github.com/mhaessig/jdk/actions/runs/14376762241) (the Alpine Linux build fails at `configure`, which is unrelated to this change) as well as tier1 through tier3 and Oracle internal testing. This pull request has now been integrated. Changeset: efb5a80e Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/efb5a80e52c8314103e1ccec05af6ab480531df0 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod 8353730: TestSubNodeFloatDoubleNegation.java fails with native Float16 support Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24565 From rcastanedalo at openjdk.org Fri Apr 11 13:26:39 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 13:26:39 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 21:06:17 GMT, Saranya Natarajan wrote: > Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. > > Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() src/hotspot/share/opto/escape.cpp line 2822: > 2820: Node* value = nullptr; > 2821: if (ini != nullptr) { > 2822: // StoreP::memory_type() == T_ADDRESS The value of this comment is debatable, but I think it is best to not remove it in this RFE (just rename `memory_type` to `value_basic_type`), to preserve the original scope of the RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24427#discussion_r2039560349 From rcastanedalo at openjdk.org Fri Apr 11 13:34:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 13:34:43 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 11:37:55 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/phaseX.cpp line 1058: >> >>> 1056: // Ensure we did not increase the live node count with more than >>> 1057: // max_live_nodes_increase_per_iteration during the call to transform_old >>> 1058: DEBUG_ONLY(int increase = live_nodes_after - live_nodes_before;) >> >> For consistency with the surrounding code, maybe you could define these as `NOT_PRODUCT`, and possibly group them under `#ifndef PRODUCT` blocks? > > As we discussed offline, there is a subtle difference between `NOT_PRODUCT` and `DEBUG_ONLY`. `NOT_PRODUCT` also runs in non-product "optimized" builds where asserts are disabled, and I really only want this code to run when asserts are enabled. Therefore, I believe `DEBUG_ONLY` (or alternatively `#ifdef ASSERT`) is most suitable here. Fair enough, thanks for the clarification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2039573857 From fyang at openjdk.org Fri Apr 11 14:24:32 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 11 Apr 2025 14:24:32 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v6] In-Reply-To: References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: On Fri, 11 Apr 2025 07:07:23 GMT, Anjian-Wen wrote: >> support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix zvbb mask match rule test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java line 139: > 137: @Test > 138: @Warmup(10000) > 139: @IR(counts = { IRNode.VAND_NOT_I_MASKED, " >= 1" }) This is failing on my aarch64 machine where there is no SVE. test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java line 156: > 154: @Test > 155: @Warmup(10000) > 156: @IR(counts = { IRNode.VAND_NOT_L_MASKED, " >= 1" }) Similar issue for this new test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24129#discussion_r2039652386 PR Review Comment: https://git.openjdk.org/jdk/pull/24129#discussion_r2039654293 From eosterlund at openjdk.org Fri Apr 11 14:45:38 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Apr 2025 14:45:38 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 13:03:43 GMT, Erik ?sterlund wrote: >> Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: >> >> - Relocate nmethod at safepoint >> - Fix windows build > > I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. > It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head. > 1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up. > 2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up. > 3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values. > 4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it? > 5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed > 6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. > 7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive a safepoint. For ... > Hi @fisk, > > Thank you for the very valuable comment. It has point we have not thought about. > > > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. > > It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache. I mean nmethods with a subtly different life cycle where usual invariants/expectations don't hold. Like method handle intrinsics and enter special intrinsics for example. Used to have a different life cycle for OSR nmethods too. > > You can't just copy oops. > > Yes, this is the main issue at the moment. Can we do this at a safepoint? I don't think it solves much. You can't stash away a pointer to the nmethod, roll to a safepoint, and expect the nmethod to not be freed. Even if you did, you still can't copy the oops. If we are to do this, I think you want to apply nmethod entry barriers first. That stabilizes the oops. > > I'm worried about copying the nmethod epoch counters > > We should clear them. If not, it is a bug. I'd like to change copying from opt-out to opt-in instead; that would make me feel more comfortable. Then perhaps you can share initialization code that sets up the initial state of the nmethod exactly in the same way as normal nmethods. I didn't check but you need to take the Compile_lock and verify dependencies too if you didn't do that, I think, so you don't race with deoptimization. > > You don't check if the nmethod is_unloading() when cloning it. > > Should such nmethods be not entrant? We don't relocate not entrant nmethods. is_not_entrant doesn't imply is_unloading. > > What are the consequences of copying the deoptimization generation? > > What do you mean? I mean is it safe to racingly copy the deoptmization generation when there is concurrent deoptimization? This is why I'd prefer copying to be opt-in rather than opt-out so we don't have to stare at every single field and wonder what will happen when a new nmethod "inherits" state from a different nmethod in interesting races. I want it to work as much as possible as normal nmethod installation, starting with a state as close as possible to when the original nmethod was created, as opposed to its eventually mutated state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2797120660 From vlivanov at openjdk.org Fri Apr 11 19:55:28 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Apr 2025 19:55:28 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:04:39 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > introduce common ReverseBytesNode Testing results are good. ------------- PR Review: https://git.openjdk.org/jdk/pull/24382#pullrequestreview-2761455330 From sviswanathan at openjdk.org Fri Apr 11 20:26:26 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 11 Apr 2025 20:26:26 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v2] In-Reply-To: References: Message-ID: <-rOibwwb_ojJHquHR_U_JrCjh1jax-uI4Kgd0mEVTfo=.1a12b754-af35-481c-968b-4246da6af423@github.com> On Thu, 10 Apr 2025 22:57:39 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Disable demotion for esetzucc and cleanup code src/hotspot/cpu/x86/assembler_x86.cpp line 17181: > 17179: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 17180: // Encoding Format : eevex_prefix (4 bytes) | opcode_cc | modrm > 17181: int encode = evex_prefix_and_encode_ndd(0, 0, dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes, false, false, false); // demotion disabled Instead of adding the demote flag, the better way to do this is by adding a single register evex_prefix_and encode_ndd: evex_prefix_and_encode_ndd(dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2040256325 From vlivanov at openjdk.org Fri Apr 11 21:23:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Apr 2025 21:23:52 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'master' into vector.math.01.java - RVV and SVE adjustments - Merge branch 'master' into vector.math.01.java - Fix windows-aarch64 build failure - features_string -> cpu_info_string - Reviews and Float64Vector-related fix - Misc fixes and cleanups - CPU features support - Cleanup - TODO list - ... and 9 more: https://git.openjdk.org/jdk/compare/4aed10ed...0ffed12f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/bb1a11db..0ffed12f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=03-04 Stats: 43013 lines in 720 files changed: 22287 ins; 17599 del; 3127 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From duke at openjdk.org Sat Apr 12 03:32:37 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sat, 12 Apr 2025 03:32:37 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v6] In-Reply-To: References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: <_jsC41fFxy1oob7ojYjeCilrOmi9CXRrESMajWkYmpk=.dd94939d-a327-49be-8cfa-691f5a37de88@github.com> On Fri, 11 Apr 2025 14:19:46 GMT, Fei Yang wrote: >> Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix zvbb mask match rule > > test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java line 156: > >> 154: @Test >> 155: @Warmup(10000) >> 156: @IR(counts = { IRNode.VAND_NOT_L_MASKED, " >= 1" }) > > Similar issue for this new test. Thanks to the reminder, it seems that sve_bci is used to match vand_notI_masked on aarch64. Turning on the UseSVE option on my aarch64 machine can pass the test, and I can add a judgment to the test that only uses it when UseSVE is supported ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24129#discussion_r2040532727 From duke at openjdk.org Sat Apr 12 03:40:46 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sat, 12 Apr 2025 03:40:46 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v7] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: add aarch64 judgement for the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/edb16683..fd4e05c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=05-06 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From cslucas at openjdk.org Sat Apr 12 04:15:29 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Sat, 12 Apr 2025 04:15:29 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 15:30:24 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > reviewer suggested changes src/hotspot/share/opto/loopnode.hpp line 267: > 265: const TypeInteger* trunc_type = nullptr; > 266: }; > 267: static TruncatedIncrement match_incr_with_optional_truncation(Node* expr, BasicType bt); Just a drive-by comment: this function and some others that you created are returning a value by copy, for performance reasons it may be better to return a reference or even a pointer as usually is the case in HotSpot. src/hotspot/share/opto/loopnode.hpp line 1276: > 1274: > 1275: struct LoopExitTest { > 1276: Node* cmp = nullptr; I think it might be a good idea to use a more specific node type if you know before hand what kind of node the field will point to - in this case CmpNode* src/hotspot/share/opto/loopnode.hpp line 1343: > 1341: _loop(loop), > 1342: _iv_bt(iv_bt) { > 1343: assert(phase != nullptr, ""); Would be nice to have an explanation of why the expression should be true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2040546046 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2040546249 PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2040546536 From duke at openjdk.org Sat Apr 12 08:02:21 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sat, 12 Apr 2025 08:02:21 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v8] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: <2C0FwNeuWsdkLo4O3G8tdc87MEX1PlrQ163q1V2wpac=.bf351955-6908-444a-ad1b-72b3e837e19d@github.com> > support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: fix test bug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/fd4e05c2..dd649507 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From qamai at openjdk.org Mon Apr 14 12:50:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Apr 2025 12:50:51 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-2801452694 From rcastanedalo at openjdk.org Mon Apr 14 12:53:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 14 Apr 2025 12:53:11 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 11:39:43 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/itergvn/TestSplitPhiThroughMergeMem.java line 67: >> >>> 65: new String("abcdef" + param2); >>> 66: new String("ghijklmn" + param1); >>> 67: new String("ghijklmn" + param1); >> >> This test illustrates an interesting behavior: C2 generates around 12 Kb of code for this rather infrequent code path (and the frequency can be further reduced without affecting C2's outcome). This seems to contradict C2's general philosophy of focusing the compilation effort (and code cache usage) on hot code. It would be interesting to investigate whether there is an opportunity to make some heuristic more execution-frequency aware here. > > Yes, for sure interesting. Let us create a separate RFE to investigate. I had a closer look at the example and found that the generation of large amounts of cold code is due to two factors: 1. (excessive?) use of `@ForceInline` in [StringConcatHelper](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringConcatHelper.java) and 2. generation of fast/path idioms for constructs within the cold inlined code (allocations in particular). For 1. we should evaluate whether forced inlining in `StringConcatHelper` is warranted (in terms of cost/benefits). For 2., I filed [JDK-8354509](https://bugs.openjdk.org/browse/JDK-8354509). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2041728751 From rcastanedalo at openjdk.org Mon Apr 14 12:53:00 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 14 Apr 2025 12:53:00 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> Message-ID: On Thu, 10 Apr 2025 12:10:01 GMT, Daniel Lund?n wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Marked as reviewed by rcastanedalo (Reviewer). Looks good! ------------- PR Review: https://git.openjdk.org/jdk/pull/24325#pullrequestreview-2763963753 PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2801335901 From hgreule at openjdk.org Mon Apr 14 12:55:15 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 14 Apr 2025 12:55:15 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: make function static ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24382/files - new: https://git.openjdk.org/jdk/pull/24382/files/90f2e9de..278a2a7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From hgreule at openjdk.org Mon Apr 14 12:55:29 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 14 Apr 2025 12:55:29 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v3] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 19:52:57 GMT, Vladimir Ivanov wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> introduce common ReverseBytesNode > > Testing results are good. @iwanowww thanks, I addressed your last comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2800628585 From dnsimon at openjdk.org Mon Apr 14 12:56:20 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 14 Apr 2025 12:56:20 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:31:54 GMT, Andrej Pe?im?th wrote: > This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. LGTM and trival. Actually, can you please add some extra tests to `TestConstantReflectionProvider.java` for out-of-bounds reads. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24200#pullrequestreview-2764015041 Changes requested by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24200#pullrequestreview-2764143284 From duke at openjdk.org Mon Apr 14 12:56:15 2025 From: duke at openjdk.org (Andrej =?UTF-8?B?UGXEjWltw7p0aA==?=) Date: Mon, 14 Apr 2025 12:56:15 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI Message-ID: This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. ------------- Commit messages: - Verify bounds for primitive array reads. Changes: https://git.openjdk.org/jdk/pull/24200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352724 Stats: 13 lines in 1 file changed: 10 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24200/head:pull/24200 PR: https://git.openjdk.org/jdk/pull/24200 From duke at openjdk.org Mon Apr 14 12:57:19 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 14 Apr 2025 12:57:19 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v8] In-Reply-To: <2C0FwNeuWsdkLo4O3G8tdc87MEX1PlrQ163q1V2wpac=.bf351955-6908-444a-ad1b-72b3e837e19d@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> <2C0FwNeuWsdkLo4O3G8tdc87MEX1PlrQ163q1V2wpac=.bf351955-6908-444a-ad1b-72b3e837e19d@github.com> Message-ID: On Sat, 12 Apr 2025 08:02:21 GMT, Anjian-Wen wrote: >> support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix test bug Appreciate the careful review @RealFYang , I will push a quick modify ------------- PR Comment: https://git.openjdk.org/jdk/pull/24129#issuecomment-2800764206 From duke at openjdk.org Mon Apr 14 12:57:00 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 14 Apr 2025 12:57:00 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v9] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> > support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8329887 - modify code and test format - fix test bug - add aarch64 judgement for the test - fix zvbb mask match rule - Merge branch 'openjdk:master' into JDK-8329887 - add vand_not_masked test - add vand_not_L test - Merge branch 'openjdk:master' into JDK-8329887 - RISC-V: C2: Support Zvbb Vector And-Not instruction fix match rule for format - ... and 1 more: https://git.openjdk.org/jdk/compare/242ae595...66c886c6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/dd649507..66c886c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=07-08 Stats: 20302 lines in 338 files changed: 12900 ins; 5617 del; 1785 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From fyang at openjdk.org Mon Apr 14 12:57:42 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 14 Apr 2025 12:57:42 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v8] In-Reply-To: <2C0FwNeuWsdkLo4O3G8tdc87MEX1PlrQ163q1V2wpac=.bf351955-6908-444a-ad1b-72b3e837e19d@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> <2C0FwNeuWsdkLo4O3G8tdc87MEX1PlrQ163q1V2wpac=.bf351955-6908-444a-ad1b-72b3e837e19d@github.com> Message-ID: On Sat, 12 Apr 2025 08:02:21 GMT, Anjian-Wen wrote: >> support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix test bug src/hotspot/cpu/riscv/riscv_v.ad line 1124: > 1122: // vector and not > 1123: > 1124: instruct vand_notI(vReg dst, vReg src2, vReg src1, immI_M1 m1) %{ `src2` should come after `src1`. Can you reoder it? Similar for `vand_notL`. test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java line 46: > 44: * @requires vm.compiler2.enabled > 45: * @requires (os.simpleArch == "aarch64" & vm.cpu.features ~= ".*asimd.*") | (os.simpleArch == "riscv64" & vm.cpu.features ~= ".*zvbb.*") > 46: * @summary [vector] Make all bits set vector sharable for match rules Better to keep the original summary (`AArch64: [vector] Make all bits set vector sharable for match rules`) which maps to https://bugs.openjdk.org/browse/JDK-8287984 test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java line 68: > 66: private static long[] la; > 67: private static long[] lb; > 68: private static long[] lr; Can you reoder it a bit so that int[] and long[] test cases are grouped together? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24129#discussion_r2041572000 PR Review Comment: https://git.openjdk.org/jdk/pull/24129#discussion_r2041542639 PR Review Comment: https://git.openjdk.org/jdk/pull/24129#discussion_r2041565117 From rcastanedalo at openjdk.org Mon Apr 14 13:02:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 14 Apr 2025 13:02:33 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor and improve TestNestedSynchronize.java Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2763987264 From rcastanedalo at openjdk.org Mon Apr 14 13:03:19 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 14 Apr 2025 13:03:19 GMT Subject: Integrated: 8348645: IGV: visualize live ranges In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 09:59:48 GMT, Roberto Casta?eda Lozano wrote: > This changeset extends IGV with live range visualization. It introduces live ranges as first-class IGV entities and displays them along with the control-flow graph in the CFG view. Visualizing liveness information should hopefully make C2's register allocator easier to understand, diagnose, debug, and enhance. > > Live ranges are visible in C2 phases where liveness information is available, that is, phases `Initial liveness` to `Fix up spills` at IGV print level 4 or greater. For example, running a debug build of the JVM as follows: > > > java -Xbatch -XX:CompileCommand=IGVPrintLevel,java.util.HashMap::newNode,4 > > > produces the following visualization for the `Initial spilling` phase: > > ![initial-spilling](https://github.com/user-attachments/assets/1ecf74f5-92a8-4866-b1ec-2323bb0c428e) > > Live ranges are first-class IGV entities, meaning that the user can: > > - search, select, and extract them; > > ![search-extract](https://github.com/user-attachments/assets/8e0dfa59-457f-49cb-b2b5-1d202301c79d) > > - examine their properties in the `Properties` window or via tooltips; > > ![properties](https://github.com/user-attachments/assets/68d2d23b-b986-4d2e-835c-b661bce0de23) > > - navigate to related IGV entities via a pop-up menu; and > > ![popup](https://github.com/user-attachments/assets/21de2fef-d36a-42d5-b828-2696d87a18ea) > > - program filters that act om them according to their properties. > > ![filters](https://github.com/user-attachments/assets/e993b067-d0b8-452c-a885-c4e601e31e1c) > > Live ranges are connected to nodes by a use-def relation: a node can define zero or one live ranges, and use multiple live ranges; a live range can be defined and used by multiple nodes. Consequently, a live range in IGV is visible if and only if all its related nodes are visible (fully or semi-transparently). Generally, the start and end of a live range are vertically aligned with the nodes that first define and last use the live range. To reflect accurately the semantics of Phi nodes w.r.t. liveness, the visualization treats live ranges related by Phi nodes specially: live ranges used by a Phi node end at the bottom of the corresponding predecessor basic blocks, whereas live ranges defined by a Phi node start at the top of the node's basic block. The following screenshot shows an example of a Phi node (`48 Phi`) joining live ranges `L8` and `L13` into `L15`: > > ![phi](https://github.com/user-attachments/assets/0ef8aa1d-523d-4391-982e-6b74c2016a3c) > > The changeset extends the IGV graph printing logic in HotSpot t... This pull request has now been integrated. Changeset: 51ce3120 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/51ce312001f3974a7e6394e9c616b04d8fb811ec Stats: 2501 lines in 53 files changed: 2307 ins; 108 del; 86 mod 8348645: IGV: visualize live ranges Reviewed-by: thartmann, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/23558 From rcastanedalo at openjdk.org Mon Apr 14 13:03:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 14 Apr 2025 13:03:18 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v5] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:48:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends IGV with live range visualization. It introduces live ranges as first-class IGV entities and displays them along with the control-flow graph in the CFG view. Visualizing liveness information should hopefully make C2's register allocator easier to understand, diagnose, debug, and enhance. >> >> Live ranges are visible in C2 phases where liveness information is available, that is, phases `Initial liveness` to `Fix up spills` at IGV print level 4 or greater. For example, running a debug build of the JVM as follows: >> >> >> java -Xbatch -XX:CompileCommand=IGVPrintLevel,java.util.HashMap::newNode,4 >> >> >> produces the following visualization for the `Initial spilling` phase: >> >> ![initial-spilling](https://github.com/user-attachments/assets/1ecf74f5-92a8-4866-b1ec-2323bb0c428e) >> >> Live ranges are first-class IGV entities, meaning that the user can: >> >> - search, select, and extract them; >> >> ![search-extract](https://github.com/user-attachments/assets/8e0dfa59-457f-49cb-b2b5-1d202301c79d) >> >> - examine their properties in the `Properties` window or via tooltips; >> >> ![properties](https://github.com/user-attachments/assets/68d2d23b-b986-4d2e-835c-b661bce0de23) >> >> - navigate to related IGV entities via a pop-up menu; and >> >> ![popup](https://github.com/user-attachments/assets/21de2fef-d36a-42d5-b828-2696d87a18ea) >> >> - program filters that act om them according to their properties. >> >> ![filters](https://github.com/user-attachments/assets/e993b067-d0b8-452c-a885-c4e601e31e1c) >> >> Live ranges are connected to nodes by a use-def relation: a node can define zero or one live ranges, and use multiple live ranges; a live range can be defined and used by multiple nodes. Consequently, a live range in IGV is visible if and only if all its related nodes are visible (fully or semi-transparently). Generally, the start and end of a live range are vertically aligned with the nodes that first define and last use the live range. To reflect accurately the semantics of Phi nodes w.r.t. liveness, the visualization treats live ranges related by Phi nodes specially: live ranges used by a Phi node end at the bottom of the corresponding predecessor basic blocks, whereas live ranges defined by a Phi node start at the top of the node's basic block. The following screenshot shows an example of a Phi node (`48 Phi`) joining live ranges `L8` and `L13` into `L15`: >> >> ![phi](https://github.com/user-attachments/assets/0ef8aa1d-523d-4391-982e-6b74c2016a3c... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Show liveness info extra line only when liveness information is available Thanks again for reviewing, Damon and Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23558#issuecomment-2801491921 From jbhateja at openjdk.org Mon Apr 14 13:46:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Apr 2025 13:46:03 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v5] In-Reply-To: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: > Adding following IR transforms for unsigned vector Min / Max nodes. > > => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) > => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) > => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) > => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) > => UMaxV (a, a) => a > => UMinV (a, a) => a > > New IR validation test accompanies the patch. > > This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Review suggestions incorporated. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Updating copyright year of modified files - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Update IR transforms and tests - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - 8342676: Unsigned Vector Min / Max transforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21604/files - new: https://git.openjdk.org/jdk/pull/21604/files/828c6c7a..eabebc74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=03-04 Stats: 74060 lines in 1474 files changed: 43237 ins; 25403 del; 5420 mod Patch: https://git.openjdk.org/jdk/pull/21604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21604/head:pull/21604 PR: https://git.openjdk.org/jdk/pull/21604 From jbhateja at openjdk.org Mon Apr 14 13:46:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Apr 2025 13:46:06 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Tue, 25 Feb 2025 17:49:33 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Updating copyright year of modified files >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Update IR transforms and tests >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - 8342676: Unsigned Vector Min / Max transforms > > @jatin-bhateja Just ping me here if this is ready for another review ;) Hi @eme64 , let us know if this version looks good to land. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21604#issuecomment-2801755134 From jbhateja at openjdk.org Mon Apr 14 13:46:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Apr 2025 13:46:09 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v4] In-Reply-To: <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> Message-ID: On Fri, 28 Mar 2025 11:00:09 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Review suggestions incorporated. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Updating copyright year of modified files >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Update IR transforms and tests >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - 8342676: Unsigned Vector Min / Max transforms > > src/hotspot/share/opto/vectornode.cpp line 1045: > >> 1043: } >> 1044: >> 1045: bool VectorNode::is_commutative() { > > Why did you make this change? It seems unrelated, maybe it slipped in from another change set? Plus, it is not an accurate name, if `if (in(1)->_idx > in(2)->_idx) {` would fail, you would say that it is not commutative ... which is wrong ;) It was modified as per suggestion from Vladimir Ivanov https://github.com/openjdk/jdk/pull/22863#discussion_r1967092266 I am ok with skipping it for now. > src/hotspot/share/opto/vectornode.cpp line 2213: > >> 2211: umin = n->in(2); >> 2212: umax = n->in(1); >> 2213: } > > Suggestion: > > } else { > // Either both Min or Max. > return nullptr; > } > > That way, you don't need the check below: > `umin != nullptr && umax != nullptr`. We still want to execute VectorNode::Ideal if new transforms are not applicable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2042168184 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2042167983 From jbhateja at openjdk.org Mon Apr 14 13:46:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Apr 2025 13:46:09 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v4] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> Message-ID: On Fri, 28 Mar 2025 11:17:03 GMT, Emanuel Peter wrote: >> Ah, and then `UMinMaxV_Ideal` also does not need the argument `can_reshape` any more! > > Suggestion: > > return nullptr; It was done in anticipation of receiving a comment about duplicate code :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2042167843 From jbhateja at openjdk.org Mon Apr 14 13:50:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Apr 2025 13:50:17 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v6] In-Reply-To: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> > Adding following IR transforms for unsigned vector Min / Max nodes. > > => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) > => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) > => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) > => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) > => UMaxV (a, a) => a > => UMinV (a, a) => a > > New IR validation test accompanies the patch. > > This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Comment refinement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21604/files - new: https://git.openjdk.org/jdk/pull/21604/files/eabebc74..6a2cb635 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21604/head:pull/21604 PR: https://git.openjdk.org/jdk/pull/21604 From mbaesken at openjdk.org Mon Apr 14 13:51:01 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 14 Apr 2025 13:51:01 GMT Subject: RFR: JDK-8354507: subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself Message-ID: When running with ubsan enabled binaries (e.g. on Linux x86_64 or macOS aarch) and e.g. executing test java/lang/Thread/virtual/CancelTimerWithContention we run into this issue : /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-linux_x86_64-opt/jdk/src/hotspot/share/opto/subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself #0 0x7facfee00605 in SubLNode::Ideal(PhaseGVN*, bool) src/hotspot/share/opto/subnode.cpp:406 #1 0x7facfea644bc in PhaseGVN::transform(Node*) src/hotspot/share/opto/phaseX.cpp:681 #2 0x7facfea1e4b9 in Parse::do_one_bytecode() src/hotspot/share/opto/parse2.cpp:2502 #3 0x7facfe9f1064 in Parse::do_one_block() src/hotspot/share/opto/parse1.cpp:1586 #4 0x7facfe9f35d6 in Parse::do_all_blocks() src/hotspot/share/opto/parse1.cpp:724 #5 0x7facfe9f71e0 in Parse::Parse(JVMState*, ciMethod*, float) src/hotspot/share/opto/parse1.cpp:628 #6 0x7facfd2c7925 in ParseGenerator::generate(JVMState*) src/hotspot/share/opto/callGenerator.cpp:97 #7 0x7facfd5e928f in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) src/hotspot/share/opto/compile.cpp:805 #8 0x7facfd2c4428 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) src/hotspot/share/opto/c2compiler.cpp:141 #9 0x7facfd5fe1db in CompileBroker::invoke_compiler_on_method(CompileTask*) src/hotspot/share/compiler/compileBroker.cpp:2307 #10 0x7facfd600a06 in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1951 #11 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:773 #12 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:761 #13 0x7facfeefc786 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:231 #14 0x7facfe9492d4 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:877 #15 0x7fad023366e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 7bac999c115902a23312484de77ffcce60812e74) #16 0x7fad0246f53e in clone (/lib64/libc.so.6+0x11853e) (BuildId: d9396455d6e682402e73ddda7af317d3c69e317c) Seems the issue is rather new and came in with [JDK-8351927](https://bugs.openjdk.org/browse/JDK-8351927) recently. ------------- Commit messages: - JDK-8354507 Changes: https://git.openjdk.org/jdk/pull/24623/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24623&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354507 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24623/head:pull/24623 PR: https://git.openjdk.org/jdk/pull/24623 From duke at openjdk.org Mon Apr 14 14:32:52 2025 From: duke at openjdk.org (Andrej =?UTF-8?B?UGXEjWltw7p0aA==?=) Date: Mon, 14 Apr 2025 14:32:52 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI [v2] In-Reply-To: References: Message-ID: > This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. Andrej Pe?im?th has updated the pull request incrementally with one additional commit since the last revision: Test reads after last array element in JVMCI. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24200/files - new: https://git.openjdk.org/jdk/pull/24200/files/1bfeb512..7475f468 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24200&range=00-01 Stats: 28 lines in 1 file changed: 22 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24200/head:pull/24200 PR: https://git.openjdk.org/jdk/pull/24200 From dnsimon at openjdk.org Mon Apr 14 14:43:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 14 Apr 2025 14:43:04 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 14:32:52 GMT, Andrej Pe?im?th wrote: >> This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. > > Andrej Pe?im?th has updated the pull request incrementally with one additional commit since the last revision: > > Test reads after last array element in JVMCI. test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestConstantReflectionProvider.java line 148: > 146: if (cv.boxed != null && cv.boxed.getClass().isArray()) { > 147: JavaKind kind = metaAccess.lookupJavaType(cv.value).getComponentType().getJavaKind(); > 148: long offset = metaAccess.getArrayBaseOffset(kind) + (long) metaAccess.getArrayIndexScale(kind) * Array.getLength(cv.boxed); If I understand correctly, this tests a read of an element one past the end of the array. Can you please also add a test for a read that is partially out-of-bounds: long offset = 1 + metaAccess.getArrayBaseOffset(kind) + (long) metaAccess.getArrayIndexScale(kind) * (Array.getLength(cv.boxed) - 1); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24200#discussion_r2042298538 From pminborg at openjdk.org Mon Apr 14 15:31:12 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 14 Apr 2025 15:31:12 GMT Subject: RFR: 8348556: Inlining fails earlier for MemorySegment::reinterpret [v7] In-Reply-To: References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> Message-ID: On Tue, 8 Apr 2025 12:24:41 GMT, Per Minborg wrote: >> This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. >> >> There are also some changes in other classes which, if implemented, can take us three additional levels of inlining. I drew a line there. There is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. >> >> Updating the `j.l.Object` constructor is crucial for the higher depths. >> >> Tested and passed tier1-3 > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Reintroduce Object changes @iwanowww : Do you have any comments on this PR? Should we just close it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23460#issuecomment-2802091492 From mdoerr at openjdk.org Mon Apr 14 15:36:02 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 14 Apr 2025 15:36:02 GMT Subject: RFR: JDK-8354507: subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 13:43:22 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries (e.g. on Linux x86_64 or macOS aarch) and e.g. executing test > java/lang/Thread/virtual/CancelTimerWithContention > > we run into this issue : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-linux_x86_64-opt/jdk/src/hotspot/share/opto/subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself > #0 0x7facfee00605 in SubLNode::Ideal(PhaseGVN*, bool) src/hotspot/share/opto/subnode.cpp:406 > #1 0x7facfea644bc in PhaseGVN::transform(Node*) src/hotspot/share/opto/phaseX.cpp:681 > #2 0x7facfea1e4b9 in Parse::do_one_bytecode() src/hotspot/share/opto/parse2.cpp:2502 > #3 0x7facfe9f1064 in Parse::do_one_block() src/hotspot/share/opto/parse1.cpp:1586 > #4 0x7facfe9f35d6 in Parse::do_all_blocks() src/hotspot/share/opto/parse1.cpp:724 > #5 0x7facfe9f71e0 in Parse::Parse(JVMState*, ciMethod*, float) src/hotspot/share/opto/parse1.cpp:628 > #6 0x7facfd2c7925 in ParseGenerator::generate(JVMState*) src/hotspot/share/opto/callGenerator.cpp:97 > #7 0x7facfd5e928f in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) src/hotspot/share/opto/compile.cpp:805 > #8 0x7facfd2c4428 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) src/hotspot/share/opto/c2compiler.cpp:141 > #9 0x7facfd5fe1db in CompileBroker::invoke_compiler_on_method(CompileTask*) src/hotspot/share/compiler/compileBroker.cpp:2307 > #10 0x7facfd600a06 in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1951 > #11 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:773 > #12 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:761 > #13 0x7facfeefc786 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:231 > #14 0x7facfe9492d4 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:877 > #15 0x7fad023366e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 7bac999c115902a23312484de77ffcce60812e74) > #16 0x7fad0246f53e in clone (/lib64/libc.so.6+0x11853e) (BuildId: d9396455d6e682402e73ddda7af317d3c69e317c) > > > Seems the issue is rather new and came in with [JDK-8351927](https://bugs.openjdk.org/browse/JDK-8351927) recently. The fix looks good and trivial. It just avoids UB due to signed long overflow due to negation of min long. Please fix the title mismatch! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24623#pullrequestreview-2764775686 From roland at openjdk.org Mon Apr 14 15:37:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 14 Apr 2025 15:37:36 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v5] In-Reply-To: References: Message-ID: > The test crashes because of a division by zero. The `Div` node for > that one is initially part of a counted loop. The control input of the > node is cleared because the divisor is non zero. This is because the > divisor depends on the loop phi and the type of the loop phi is > narrowed down when the counted loop is created. pre/main/post loops > are created, unrolling happens, the main loop looses its backedge. The > `Div` node can then float above the zero trip guard for the main > loop. When the zero trip guard is not taken, there's no guarantee the > divisor is non zero so the `Div` node should be pinned below it. > > I propose we revert the change I made with 8334724 which removed > `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this > method inserted was there to handle exactly this problem. It was added > initially for a similar issue but with array loads. That problem with > loads is handled some other way now and that's why I thought it was > safe to proceed with the removal. > > The code in this patch is somewhat different from the one we had > before for a couple reasons: > > 1- assert predicate code evolved and so previous logic can't be > resurrected as it was. > > 2- the previous logic has a bug. > > Regarding 1-: during pre/main/post loop creation, we used to add the > `CastII` and then to add assertion predicates (so assertion predicates > depended on the `CastII`). Then when unrolling, when assertion > predicates are updated, we would skip over the `CastII`. What I > propose here is to add the `CastII` after assertion predicates are > added. As a result, they don't depend on the `CastII` and there's no > need for any extra logic when unrolling happens. This, however, > doesn't work when the assertion predicates are added by RCE. In that > case, I had to add logic to skip over the `CastII` (similar to what > existed before I removed it). > > Regarding 2-: previous implementation for > `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at > the first loop `Phi` it encounters that's a use of the loop increment: > it's usually the iv but not always. I tweaked the test case to show, > this bug can actually cause a crash and changed the logic for > `PhaseIdealLoop::cast_incr_before_loop()` accordingly. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - review - Merge branch 'master' into JDK-8349139 - merge - Merge branch 'master' into JDK-8349139 - other test + review comment - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/23617/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23617&range=04 Stats: 207 lines in 7 files changed: 175 ins; 25 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23617/head:pull/23617 PR: https://git.openjdk.org/jdk/pull/23617 From roland at openjdk.org Mon Apr 14 15:37:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 14 Apr 2025 15:37:41 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 10 Apr 2025 12:29:20 GMT, Quan Anh Mai wrote: > It would be great if we have union memory slices for this. Something like that would fix it but it would be trickier to get right that this point fix, I think. Do you see any other use for it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2802114859 From roland at openjdk.org Mon Apr 14 15:37:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 14 Apr 2025 15:37:39 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v4] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 10:00:08 GMT, Emanuel Peter wrote: > Just submitted some more testing :) Thanks. Any update on test results? @eme64 > test/hotspot/jtreg/compiler/controldependency/TestDivDependentOnMainLoopGuard.java line 31: > >> 29: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM -XX:StressSeed=139899009 TestDivDependentOnMainLoopGuard >> 30: * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:CompileOnly=TestDivDependentOnMainLoopGuard::* >> 31: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM TestDivDependentOnMainLoopGuard > > Would it make sense to have a run without some of the flags here? Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2802111369 PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2042407676 From roland at openjdk.org Mon Apr 14 15:37:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 14 Apr 2025 15:37:37 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v4] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:27:04 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - merge > - Merge branch 'master' into JDK-8349139 > - other test + review comment > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - fix & test > I think this looks reasonable. > > `test/hotspot/jtreg/compiler/controldependency/TestDivDependentOnMainLoopGuard.java` In this test you could also randomize the `2` I commented on. Because in #3190 you had the same test but with a constant `3`. A `public static final int` with a random value in a reasonable range could do the trick. Updated the test. Does it look like what you had in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2802109937 From duke at openjdk.org Mon Apr 14 16:33:07 2025 From: duke at openjdk.org (Andrej =?UTF-8?B?UGXEjWltw7p0aA==?=) Date: Mon, 14 Apr 2025 16:33:07 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI [v3] In-Reply-To: References: Message-ID: > This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. Andrej Pe?im?th has updated the pull request incrementally with one additional commit since the last revision: Test array reads that are partially out of bounds. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24200/files - new: https://git.openjdk.org/jdk/pull/24200/files/7475f468..3661b212 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24200&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24200&range=01-02 Stats: 17 lines in 1 file changed: 16 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24200/head:pull/24200 PR: https://git.openjdk.org/jdk/pull/24200 From duke at openjdk.org Mon Apr 14 16:37:43 2025 From: duke at openjdk.org (Andrej =?UTF-8?B?UGXEjWltw7p0aA==?=) Date: Mon, 14 Apr 2025 16:37:43 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI [v2] In-Reply-To: References: Message-ID: <4p7s9KQLUxBWfHroigJ58xbtbGVwyMEVUOX0lEjvgGk=.24971263-8748-4d6e-ac6f-805dd1612be1@github.com> On Mon, 14 Apr 2025 14:39:35 GMT, Doug Simon wrote: >> Andrej Pe?im?th has updated the pull request incrementally with one additional commit since the last revision: >> >> Test reads after last array element in JVMCI. > > test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestConstantReflectionProvider.java line 148: > >> 146: if (cv.boxed != null && cv.boxed.getClass().isArray()) { >> 147: JavaKind kind = metaAccess.lookupJavaType(cv.value).getComponentType().getJavaKind(); >> 148: long offset = metaAccess.getArrayBaseOffset(kind) + (long) metaAccess.getArrayIndexScale(kind) * Array.getLength(cv.boxed); > > If I understand correctly, this tests a read of an element one past the end of the array. > Can you please also add a test for a read that is partially out-of-bounds: > > long offset = 1 + metaAccess.getArrayBaseOffset(kind) + (long) metaAccess.getArrayIndexScale(kind) * (Array.getLength(cv.boxed) - 1); I added a test for a `long` read from `array[array.index - 1]` because adding `+ 1` would make the read unaligned (which is also not allowed). Please check it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24200#discussion_r2042506762 From dnsimon at openjdk.org Mon Apr 14 16:41:44 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 14 Apr 2025 16:41:44 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI [v3] In-Reply-To: References: Message-ID: <6dfjF8s-i_CdERHJjGXUHGaMhI_LKgxFxULtPzc3Ufc=.30e51f39-1d5e-4499-968c-aca28094d220@github.com> On Mon, 14 Apr 2025 16:33:07 GMT, Andrej Pe?im?th wrote: >> This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. > > Andrej Pe?im?th has updated the pull request incrementally with one additional commit since the last revision: > > Test array reads that are partially out of bounds. Thanks for the new tests. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24200#pullrequestreview-2764946450 From duke at openjdk.org Mon Apr 14 16:47:43 2025 From: duke at openjdk.org (duke) Date: Mon, 14 Apr 2025 16:47:43 GMT Subject: RFR: 8352724: Verify bounds for primitive array reads in JVMCI [v3] In-Reply-To: References: Message-ID: <03LwU0TS57dra3K3ZINSSqMY5Sx0biflR9g18IIKFDk=.c528c0f8-0d26-4e43-94d2-ebab19d18eef@github.com> On Mon, 14 Apr 2025 16:33:07 GMT, Andrej Pe?im?th wrote: >> This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. > > Andrej Pe?im?th has updated the pull request incrementally with one additional commit since the last revision: > > Test array reads that are partially out of bounds. @pecimuth Your change (at version 3661b212f01d2e1708add26fb964c41a9c2d087f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24200#issuecomment-2802293068 From sparasa at openjdk.org Mon Apr 14 18:27:13 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 14 Apr 2025 18:27:13 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v3] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactor esetzucc to do encoding without a demote flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/bd846088..09d3c3a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=01-02 Stats: 16 lines in 2 files changed: 9 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Mon Apr 14 18:27:13 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 14 Apr 2025 18:27:13 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v2] In-Reply-To: <-rOibwwb_ojJHquHR_U_JrCjh1jax-uI4Kgd0mEVTfo=.1a12b754-af35-481c-968b-4246da6af423@github.com> References: <-rOibwwb_ojJHquHR_U_JrCjh1jax-uI4Kgd0mEVTfo=.1a12b754-af35-481c-968b-4246da6af423@github.com> Message-ID: On Fri, 11 Apr 2025 20:23:30 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Disable demotion for esetzucc and cleanup code > > src/hotspot/cpu/x86/assembler_x86.cpp line 17181: > >> 17179: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 17180: // Encoding Format : eevex_prefix (4 bytes) | opcode_cc | modrm >> 17181: int encode = evex_prefix_and_encode_ndd(0, 0, dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes, false, false, false); // demotion disabled > > Instead of adding the demote flag, the better way to do this is by adding a single register evex_prefix_and encode_ndd: > evex_prefix_and_encode_ndd(dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); Please see the suggested change incorporated in to the recently pushed update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2042681146 From duke at openjdk.org Mon Apr 14 18:34:52 2025 From: duke at openjdk.org (Andrej =?UTF-8?B?UGXEjWltw7p0aA==?=) Date: Mon, 14 Apr 2025 18:34:52 GMT Subject: Integrated: 8352724: Verify bounds for primitive array reads in JVMCI In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:31:54 GMT, Andrej Pe?im?th wrote: > This PR adds a bounds check for primitive array reads in JVMCI. When a JVMCI compiler attempts to read after the last array element (from the padding of the allocated object), JVMCI should throw an exception instead of returning a garbage value. The check added in this PR handles both primitive and object reads. This pull request has now been integrated. Changeset: de0e6488 Author: Andrej Pecimuth Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/de0e6488449303bd15d4590480a2e47b8026a9b1 Stats: 57 lines in 2 files changed: 48 ins; 7 del; 2 mod 8352724: Verify bounds for primitive array reads in JVMCI Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/24200 From vlivanov at openjdk.org Mon Apr 14 19:31:42 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Apr 2025 19:31:42 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:55:15 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > make function static Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24382#pullrequestreview-2765458481 From kxu at openjdk.org Mon Apr 14 21:36:02 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 14 Apr 2025 21:36:02 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 19:02:37 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 415: >> >>> 413: if (find_power_of_two_addition_pattern(this, bt).valid) { >>> 414: return nullptr; >>> 415: } >> >> Hmm. So somewhere we would have generated that pattern, probably in MulNode. Can you add a verification there, to check that we are only generating patterns that `find_power_of_two_addition_pattern` recognizes? That would make sure that we keep the code here and there in sync. > > Added `assert()` to `Mul[IL]Node::Ideal()` After experiment for a while, I found it not practical to assert power-of-2 patterns in `Ideal()` (or anywhere else). https://github.com/openjdk/jdk/blob/e9b42dcce268f32bd2ec3c01ac6221073b888538/src/hotspot/share/opto/mulnode.cpp#L263-L266 First, `phase->transform()` could transform `LShiftINode`s into something very different. Second, there is no guarantee the `res = AddINode` will remain untransformed by the time it's passed into `find_power_of_two_addition_pattern()`. Asserting power-of-2 patterns before such transformation is not very helpful. > @eme64: [...] check that we are only generating patterns that find_power_of_two_addition_pattern recognizes In short, we can't guarantee we'll always pick up *transformed* power-of-2 patterns, at least not with significant effort and difficulty predicting all possible transformations. However, I would argue making this guarantee is outside the scope of this issue as all serial addition patterns are correctly picked up and optimized right now. I'm not sure how to proceed with this. Please let me know what you think. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2042974649 From kxu at openjdk.org Mon Apr 14 21:40:56 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 14 Apr 2025 21:40:56 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v14] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 20:58:23 GMT, Dean Long wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo: lhs->rhs > > src/hotspot/share/utilities/globalDefinitions.hpp line 1278: > >> 1276: JAVA_INTEGER_SHIFT_BASIC_TYPE(java_shift_right) >> 1277: JAVA_INTEGER_SHIFT_BASIC_TYPE(java_shift_right_unsigned) >> 1278: > > Couldn't this be written as a template, so we could use java_shift_left(x,y) or java_shift_left(x.y)? Explicit `BasicType` as a function (instead of template) parameter allows more flexibility at runtime when `bt` is a variable. For example: https://github.com/openjdk/jdk/blob/e9b42dcce268f32bd2ec3c01ac6221073b888538/src/hotspot/share/opto/addnode.cpp#L471 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2042981441 From duke at openjdk.org Mon Apr 14 21:46:26 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 14 Apr 2025 21:46:26 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v2] In-Reply-To: References: Message-ID: > Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. > > Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24427/files - new: https://git.openjdk.org/jdk/pull/24427/files/f2305491..b9dfee3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24427&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24427&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24427.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24427/head:pull/24427 PR: https://git.openjdk.org/jdk/pull/24427 From sparasa at openjdk.org Mon Apr 14 23:33:25 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 14 Apr 2025 23:33:25 GMT Subject: RFR: 8329030: Fix bugs in APX NDD code generation for OpenJDK PR Message-ID: This PR fixes the bugs discovered in the APX NDD code generation PR (#23501) that got integrated into OpenJDK. The following compiler tests uncovered these bugs: test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java test/hotspot/jtreg/compiler/c2/cr7192963/TestLongVect.java test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java After the bug fixes in this PR, using Intel Software Development Emulator (SDE) it was verified that the above tests are working correctly when using Intel APX. ------------- Commit messages: - 8329030: Fix bugs in APX NDD code generation for OpenJDK PR Changes: https://git.openjdk.org/jdk/pull/24637/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24637&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329030 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24637.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24637/head:pull/24637 PR: https://git.openjdk.org/jdk/pull/24637 From sparasa at openjdk.org Mon Apr 14 23:56:53 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 14 Apr 2025 23:56:53 GMT Subject: RFR: 8354544: Fix bugs in APX NDD code generation for OpenJDK PR [v2] In-Reply-To: References: Message-ID: > This PR fixes the bugs discovered in the APX NDD code generation PR (#23501) that got integrated into OpenJDK. > > The following compiler tests uncovered these bugs: > > test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java > test/hotspot/jtreg/compiler/c2/cr7192963/TestLongVect.java > test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java > test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java > > After the bug fixes in this PR, using Intel Software Development Emulator (SDE) it was verified that the above tests are working correctly when using Intel APX. Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'apxfix' of https://github.com/vamsi-parasa/jdk into apxfix - 8354544: Fix bugs in APX NDD code generation for OpenJDK PR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24637/files - new: https://git.openjdk.org/jdk/pull/24637/files/83832b88..b1a47149 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24637&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24637&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24637.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24637/head:pull/24637 PR: https://git.openjdk.org/jdk/pull/24637 From kvn at openjdk.org Tue Apr 15 00:38:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Apr 2025 00:38:42 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> Message-ID: On Thu, 10 Apr 2025 12:10:01 GMT, Daniel Lund?n wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24325#pullrequestreview-2766153601 From vpaprotski at openjdk.org Tue Apr 15 03:51:39 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Apr 2025 03:51:39 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts Message-ID: The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: if (UseIntPolyIntrinsics) { StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); StubRoutines::_intpoly_assign = generate_intpoly_assign(); } ------------- Commit messages: - Fix ECore check to be same as UseIntPolyIntrinsics Changes: https://git.openjdk.org/jdk/pull/24644/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24644&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354471 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24644.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24644/head:pull/24644 PR: https://git.openjdk.org/jdk/pull/24644 From vpaprotski at openjdk.org Tue Apr 15 03:59:33 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Apr 2025 03:59:33 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts Message-ID: It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << ============================== TEST FAILURE After the fix: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 ============================== TEST SUCCESS And on an AVX512 machine: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 ============================== TEST SUCCESS ------------- Commit messages: - always fixup mask in blend emulation in compress_expand Changes: https://git.openjdk.org/jdk/pull/24645/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24645&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354473 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24645/head:pull/24645 PR: https://git.openjdk.org/jdk/pull/24645 From thartmann at openjdk.org Tue Apr 15 06:24:45 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Apr 2025 06:24:45 GMT Subject: RFR: 8354544: Fix bugs in APX NDD code generation for OpenJDK PR [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 23:56:53 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the bugs discovered in the APX NDD code generation PR (#23501) that got integrated into OpenJDK. >> >> The following compiler tests uncovered these bugs: >> >> test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java >> test/hotspot/jtreg/compiler/c2/cr7192963/TestLongVect.java >> test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java >> test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java >> >> After the bug fixes in this PR, using Intel Software Development Emulator (SDE) it was verified that the above tests are working correctly when using Intel APX. > > Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'apxfix' of https://github.com/vamsi-parasa/jdk into apxfix > - 8354544: Fix bugs in APX NDD code generation for OpenJDK PR That looks good to me. I converted the sub-task to a bug. Please use a more descriptive title and add the failure modes to the description of the JBS issue. Thanks! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24637#pullrequestreview-2766951472 From hgreule at openjdk.org Tue Apr 15 06:38:41 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 15 Apr 2025 06:38:41 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 19:28:54 GMT, Vladimir Ivanov wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> make function static > > Looks good. Thank you @iwanowww. @eme64 could you have another look too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2803973445 From thartmann at openjdk.org Tue Apr 15 06:50:40 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Apr 2025 06:50:40 GMT Subject: RFR: JDK-8354507: subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself In-Reply-To: References: Message-ID: <0RcN-dPXU3u9bzj4knvTKt6meWetN6bL50nkYFYh1Ko=.43b6d29c-dd5c-49e7-a26b-485b0c1d82ed@github.com> On Mon, 14 Apr 2025 13:43:22 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries (e.g. on Linux x86_64 or macOS aarch) and e.g. executing test > java/lang/Thread/virtual/CancelTimerWithContention > > we run into this issue : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-linux_x86_64-opt/jdk/src/hotspot/share/opto/subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself > #0 0x7facfee00605 in SubLNode::Ideal(PhaseGVN*, bool) src/hotspot/share/opto/subnode.cpp:406 > #1 0x7facfea644bc in PhaseGVN::transform(Node*) src/hotspot/share/opto/phaseX.cpp:681 > #2 0x7facfea1e4b9 in Parse::do_one_bytecode() src/hotspot/share/opto/parse2.cpp:2502 > #3 0x7facfe9f1064 in Parse::do_one_block() src/hotspot/share/opto/parse1.cpp:1586 > #4 0x7facfe9f35d6 in Parse::do_all_blocks() src/hotspot/share/opto/parse1.cpp:724 > #5 0x7facfe9f71e0 in Parse::Parse(JVMState*, ciMethod*, float) src/hotspot/share/opto/parse1.cpp:628 > #6 0x7facfd2c7925 in ParseGenerator::generate(JVMState*) src/hotspot/share/opto/callGenerator.cpp:97 > #7 0x7facfd5e928f in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) src/hotspot/share/opto/compile.cpp:805 > #8 0x7facfd2c4428 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) src/hotspot/share/opto/c2compiler.cpp:141 > #9 0x7facfd5fe1db in CompileBroker::invoke_compiler_on_method(CompileTask*) src/hotspot/share/compiler/compileBroker.cpp:2307 > #10 0x7facfd600a06 in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1951 > #11 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:773 > #12 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:761 > #13 0x7facfeefc786 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:231 > #14 0x7facfe9492d4 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:877 > #15 0x7fad023366e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 7bac999c115902a23312484de77ffcce60812e74) > #16 0x7fad0246f53e in clone (/lib64/libc.so.6+0x11853e) (BuildId: d9396455d6e682402e73ddda7af317d3c69e317c) > > > Seems the issue is rather new and came in with [JDK-8351927](https://bugs.openjdk.org/browse/JDK-8351927) recently. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24623#pullrequestreview-2767024357 From mbaesken at openjdk.org Tue Apr 15 07:08:50 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 15 Apr 2025 07:08:50 GMT Subject: RFR: JDK-8354507: subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself In-Reply-To: References: Message-ID: <_4CtIxE-PAHWhJmp4U-_aetnAtghTgn5va_VUfC31zQ=.77c40739-1448-44db-909b-da9879b977b3@github.com> On Mon, 14 Apr 2025 13:43:22 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries (e.g. on Linux x86_64 or macOS aarch) and e.g. executing test > java/lang/Thread/virtual/CancelTimerWithContention > > we run into this issue : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-linux_x86_64-opt/jdk/src/hotspot/share/opto/subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself > #0 0x7facfee00605 in SubLNode::Ideal(PhaseGVN*, bool) src/hotspot/share/opto/subnode.cpp:406 > #1 0x7facfea644bc in PhaseGVN::transform(Node*) src/hotspot/share/opto/phaseX.cpp:681 > #2 0x7facfea1e4b9 in Parse::do_one_bytecode() src/hotspot/share/opto/parse2.cpp:2502 > #3 0x7facfe9f1064 in Parse::do_one_block() src/hotspot/share/opto/parse1.cpp:1586 > #4 0x7facfe9f35d6 in Parse::do_all_blocks() src/hotspot/share/opto/parse1.cpp:724 > #5 0x7facfe9f71e0 in Parse::Parse(JVMState*, ciMethod*, float) src/hotspot/share/opto/parse1.cpp:628 > #6 0x7facfd2c7925 in ParseGenerator::generate(JVMState*) src/hotspot/share/opto/callGenerator.cpp:97 > #7 0x7facfd5e928f in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) src/hotspot/share/opto/compile.cpp:805 > #8 0x7facfd2c4428 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) src/hotspot/share/opto/c2compiler.cpp:141 > #9 0x7facfd5fe1db in CompileBroker::invoke_compiler_on_method(CompileTask*) src/hotspot/share/compiler/compileBroker.cpp:2307 > #10 0x7facfd600a06 in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1951 > #11 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:773 > #12 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:761 > #13 0x7facfeefc786 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:231 > #14 0x7facfe9492d4 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:877 > #15 0x7fad023366e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 7bac999c115902a23312484de77ffcce60812e74) > #16 0x7fad0246f53e in clone (/lib64/libc.so.6+0x11853e) (BuildId: d9396455d6e682402e73ddda7af317d3c69e317c) > > > Seems the issue is rather new and came in with [JDK-8351927](https://bugs.openjdk.org/browse/JDK-8351927) recently. Thanks for the reviews ! The title is just a short version of the compiler message so it tells us what the compiler tells. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24623#issuecomment-2804041545 From fyang at openjdk.org Tue Apr 15 07:45:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 15 Apr 2025 07:45:54 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v9] In-Reply-To: <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> Message-ID: On Mon, 14 Apr 2025 12:57:00 GMT, Anjian-Wen wrote: >> support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg >> >> This patch add new test in >> test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java >> >> Test Passed >> test/hotspot/jtreg/compiler/vectorapi/* >> in platform: >> aarch64 with sve >> aarch64 without sve >> riscv64 qemu with zvbb > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8329887 > - modify code and test format > - fix test bug > - add aarch64 judgement for the test > - fix zvbb mask match rule > - Merge branch 'openjdk:master' into JDK-8329887 > - add vand_not_masked test > - add vand_not_L test > - Merge branch 'openjdk:master' into JDK-8329887 > - RISC-V: C2: Support Zvbb Vector And-Not instruction > > fix match rule for format > - ... and 1 more: https://git.openjdk.org/jdk/compare/f81bc4be...66c886c6 Updated change looks good. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24129#pullrequestreview-2767190895 From shade at openjdk.org Tue Apr 15 08:36:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Apr 2025 08:36:34 GMT Subject: RFR: 8354542: Clean up x86 stubs after 32-bit x86 removal Message-ID: This cleanup target various x86-specific stubs and related generators. Additional testing: - [x] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24633/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24633&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354542 Stats: 176 lines in 6 files changed: 0 ins; 67 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/24633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24633/head:pull/24633 PR: https://git.openjdk.org/jdk/pull/24633 From dlunden at openjdk.org Tue Apr 15 08:52:44 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 15 Apr 2025 08:52:44 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: References: Message-ID: <8b9fZCxeR9Qk_lW7F0bNfL-qKysxmEX0JF3wcO5NYPs=.21f97cdc-5617-4f0e-beba-b6c1d1a6f91e@github.com> On Mon, 14 Apr 2025 09:08:04 GMT, Roberto Casta?eda Lozano wrote: >> Yes, for sure interesting. Let us create a separate RFE to investigate. > > I had a closer look at the example and found that the generation of large amounts of cold code is due to two factors: > 1. (excessive?) use of `@ForceInline` in [StringConcatHelper](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringConcatHelper.java) and > 2. generation of fast/path idioms for constructs within the cold inlined code (allocations in particular). > > For 1. we should evaluate whether forced inlining in `StringConcatHelper` is warranted (in terms of cost/benefits). For 2., I filed [JDK-8354509](https://bugs.openjdk.org/browse/JDK-8354509). Thanks for filing [JDK-8354509](https://bugs.openjdk.org/browse/JDK-8354509). Should we file an RFE for 1. as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24325#discussion_r2044013312 From dlunden at openjdk.org Tue Apr 15 09:00:48 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 15 Apr 2025 09:00:48 GMT Subject: RFR: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal [v3] In-Reply-To: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> References: <8cQweFEP9gLJuwgDDwUsAqS2GYg0bGnuCL39kQETrPQ=.7c315669-835d-4aa7-b946-5c417230814d@github.com> Message-ID: On Thu, 10 Apr 2025 12:10:01 GMT, Daniel Lund?n wrote: >> After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. >> >> ### Changeset >> >> Changes: >> - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. >> - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. >> - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance testing >> - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. >> - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. >> >> ### Additional issue investigation >> >> For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in its... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24325#issuecomment-2804333668 From dlunden at openjdk.org Tue Apr 15 09:00:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 15 Apr 2025 09:00:49 GMT Subject: Integrated: 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:25:48 GMT, Daniel Lund?n wrote: > After the changes for [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393), we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in `PhaseIterGVN::optimize`. In particular, when we are close to the `MaxNodeLimit` (80 000 by default), it can happen that we go from below `MaxNodeLimit - NodeLimitFudgeFactor * 2` (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the `PhaseIterGVN::optimize` loop does not trigger as expected and we instead crash at an assert in node creation as we surpass `MaxNodeLimit` nodes. > > ### Changeset > > Changes: > - Do not immediately transform new Phi nodes after splitting Phis through MergeMems. The Phi nodes are put on the IGVN worklist and are transformed later on in any case. > - Add an assert in the `PhaseIterGVN::optimize` loop that ensures we never increase the live node count with more than `NodeLimitFudgeFactor * 2` in a single loop iteration. This assert allows us to catch the issue earlier and much more frequently during IGVN. > - Add a new regression test `TestSplitPhiThroughMergeMem.java`. The new assert above triggers the issue in a large number of existing tests already, but I added this new test as well for good measure. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14124983489) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing > - DaCapo 23, Renaissance, SPECjbb 2005, and SPECjvm 2008 on Windows x64, Linux x64, macOS x64, and macOS aarch64. No statistically significant improvements nor regressions. > - Compilation time benchmarking for DaCapo 23. No statistically significant improvements nor regressions. > > ### Additional issue investigation > > For the particular failure reported as part of this issue, the additional Phi idealizations after [JDK-8333393](https://bugs.openjdk.org/browse/JDK-8333393) cause a dramatic local increase in the number of nodes during IGVN compared to before. Therefore, it is justified to further investigate if this increase in live nodes is, in general, an issue in itself. In the below, I consider and refer ... This pull request has now been integrated. Changeset: 24be888d Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/24be888d655a5227cfb9fc22f36d6ba30d732b8d Stats: 105 lines in 4 files changed: 87 ins; 10 del; 8 mod 8351833: Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal Reviewed-by: chagedorn, rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24325 From mbaesken at openjdk.org Tue Apr 15 09:06:00 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 15 Apr 2025 09:06:00 GMT Subject: Integrated: 8354507: [ubsan] subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int' In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 13:43:22 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries (e.g. on Linux x86_64 or macOS aarch) and e.g. executing test > java/lang/Thread/virtual/CancelTimerWithContention > > we run into this issue : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-linux_x86_64-opt/jdk/src/hotspot/share/opto/subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself > #0 0x7facfee00605 in SubLNode::Ideal(PhaseGVN*, bool) src/hotspot/share/opto/subnode.cpp:406 > #1 0x7facfea644bc in PhaseGVN::transform(Node*) src/hotspot/share/opto/phaseX.cpp:681 > #2 0x7facfea1e4b9 in Parse::do_one_bytecode() src/hotspot/share/opto/parse2.cpp:2502 > #3 0x7facfe9f1064 in Parse::do_one_block() src/hotspot/share/opto/parse1.cpp:1586 > #4 0x7facfe9f35d6 in Parse::do_all_blocks() src/hotspot/share/opto/parse1.cpp:724 > #5 0x7facfe9f71e0 in Parse::Parse(JVMState*, ciMethod*, float) src/hotspot/share/opto/parse1.cpp:628 > #6 0x7facfd2c7925 in ParseGenerator::generate(JVMState*) src/hotspot/share/opto/callGenerator.cpp:97 > #7 0x7facfd5e928f in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) src/hotspot/share/opto/compile.cpp:805 > #8 0x7facfd2c4428 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) src/hotspot/share/opto/c2compiler.cpp:141 > #9 0x7facfd5fe1db in CompileBroker::invoke_compiler_on_method(CompileTask*) src/hotspot/share/compiler/compileBroker.cpp:2307 > #10 0x7facfd600a06 in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1951 > #11 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:773 > #12 0x7facfde5a3c4 in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:761 > #13 0x7facfeefc786 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:231 > #14 0x7facfe9492d4 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:877 > #15 0x7fad023366e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 7bac999c115902a23312484de77ffcce60812e74) > #16 0x7fad0246f53e in clone (/lib64/libc.so.6+0x11853e) (BuildId: d9396455d6e682402e73ddda7af317d3c69e317c) > > > Seems the issue is rather new and came in with [JDK-8351927](https://bugs.openjdk.org/browse/JDK-8351927) recently. This pull request has now been integrated. Changeset: 81d4c807 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/81d4c80742305b72c73a59cf6a596b49bc68bab9 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8354507: [ubsan] subnode.cpp:406:36: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int' Reviewed-by: mdoerr, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24623 From xgong at openjdk.org Tue Apr 15 09:32:49 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 15 Apr 2025 09:32:49 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 21:23:52 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - Reviews and Float64Vector-related fix > - Misc fixes and cleanups > - CPU features support > - Cleanup > - TODO list > - ... and 9 more: https://git.openjdk.org/jdk/compare/6e3fd4ab...0ffed12f src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 198: > 196: if (vspecies.vectorBitSize() < 128) { > 197: return false; // 64-bit vectors are not supported > 198: } Thanks for your refactor. It's really a good job! It seems float type support 64-bit vector operations before (see https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L9835). Will this change the behavior of 64-bit float vector? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2044095931 From adinn at openjdk.org Tue Apr 15 09:56:56 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 15 Apr 2025 09:56:56 GMT Subject: RFR: 8354542: Clean up x86 stubs after 32-bit x86 removal In-Reply-To: References: Message-ID: <0vtE4_-wvfK5WDMCa_IbXF4pnCn8oZvW6MBQaIJz9W0=.f5813de8-4d2d-4942-9b08-8ec67b3de04a@github.com> On Mon, 14 Apr 2025 18:58:24 GMT, Aleksey Shipilev wrote: > This cleanup target various x86-specific stubs and related generators. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Looks good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24633#pullrequestreview-2767628926 From jbhateja at openjdk.org Tue Apr 15 11:28:49 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Apr 2025 11:28:49 GMT Subject: RFR: 8354544: Fix bugs in APX NDD code generation for OpenJDK PR [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 23:56:53 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the bugs discovered in the APX NDD code generation PR (#23501) that got integrated into OpenJDK. >> >> The following compiler tests uncovered these bugs: >> >> test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java >> test/hotspot/jtreg/compiler/c2/cr7192963/TestLongVect.java >> test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java >> test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java >> >> After the bug fixes in this PR, using Intel Software Development Emulator (SDE) it was verified that the above tests are working correctly when using Intel APX. > > Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'apxfix' of https://github.com/vamsi-parasa/jdk into apxfix > - 8354544: Fix bugs in APX NDD code generation for OpenJDK PR Looks good to me. Best Regards, Jatin ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24637#pullrequestreview-2767894461 From jbhateja at openjdk.org Tue Apr 15 11:40:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Apr 2025 11:40:40 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:46:13 GMT, Volodymyr Paprotski wrote: > The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. > > Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: > > if (UseIntPolyIntrinsics) { > StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); > StubRoutines::_intpoly_assign = generate_intpoly_assign(); > } src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 567: > 565: __ enter(); > 566: > 567: if (VM_Version::supports_avxifma()) { As per the latest architecture-instruction-set-extensions-programming-reference document, the upcoming Diamond Rapdis CPU has AVX-IFMA but its also a P-Core Xeon with 512 bit flavour of IFMA, do you think it is performant to generate montgomeryMultiplyAVX2 for it ? ![image](https://github.com/user-attachments/assets/f1247484-180d-4a65-9b79-745a388c0552) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24644#discussion_r2044327464 From qamai at openjdk.org Tue Apr 15 12:13:45 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 15 Apr 2025 12:13:45 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v5] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 15:37:36 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - review > - Merge branch 'master' into JDK-8349139 > - merge > - Merge branch 'master' into JDK-8349139 > - other test + review comment > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - fix & test Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23617#pullrequestreview-2768008122 From galder at openjdk.org Tue Apr 15 12:55:55 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 15 Apr 2025 12:55:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: <3hC_ssvo-oNh-Dq8piegVmasZWUgoxydUBKM-ZkvRgg=.66119a54-6e54-4408-b22e-2d6981067dc5@github.com> On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-2768147644 From fjiang at openjdk.org Tue Apr 15 13:42:45 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 15 Apr 2025 13:42:45 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v9] In-Reply-To: <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> Message-ID: On Mon, 14 Apr 2025 12:57:00 GMT, Anjian-Wen wrote: >> support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg >> >> This patch add new test in >> test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java >> >> Test Passed >> test/hotspot/jtreg/compiler/vectorapi/* >> in platform: >> aarch64 with sve >> aarch64 without sve >> riscv64 qemu with zvbb > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8329887 > - modify code and test format > - fix test bug > - add aarch64 judgement for the test > - fix zvbb mask match rule > - Merge branch 'openjdk:master' into JDK-8329887 > - add vand_not_masked test > - add vand_not_L test > - Merge branch 'openjdk:master' into JDK-8329887 > - RISC-V: C2: Support Zvbb Vector And-Not instruction > > fix match rule for format > - ... and 1 more: https://git.openjdk.org/jdk/compare/3f4aeaa5...66c886c6 Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24129#pullrequestreview-2768322643 From jbhateja at openjdk.org Tue Apr 15 13:57:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Apr 2025 13:57:38 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Message-ID: ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. Please review and share your feedback. Best Regards, Jatin PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. ------------- Commit messages: - 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24664/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354668 Stats: 16 lines in 4 files changed: 5 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From avoitylov at openjdk.org Tue Apr 15 14:20:47 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 15 Apr 2025 14:20:47 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 15:46:41 GMT, Andrew Haley wrote: >> Maybe, but I'm not sure this is the right thing to enforce in the build system across the board either. For example, a cloud vendor with a big Arm fleet without A53s may find such a setting undesirable. > >> Maybe, but I'm not sure this is the right thing to enforce in the build system across the board either. For example, a cloud vendor with a big Arm fleet without A53s may find such a setting undesirable. > > I don't think there is any possibility that a big cloud vendor would notice. From what I see of GCC-generated code, added NOPs are very rare, because GCC tends to schedule loads long before uses. It's likely that there would be no difference to any GCC-generated code. @theRealAph I was under the impression that building with, say, `-mcpu=cortex-a75 -mfix-cortex-a53-835769` would fail, but surprisingly (to me), it does not. After randomly checking Alma Linux 9, Red Hat 8, Debian 12, Ubuntu 24, Fedora 41 and our Alpaquita, and finding out that for all these distros it?s on by default: $ gcc -Q --help=target | grep -i fix-cortex-a53-835769 -mfix-cortex-a53-835769 [enabled] I find no compelling reason to enforce this erratum in OpenJDK native build system. Those who disable it probably know what they are doing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2805347492 From smonteith at openjdk.org Tue Apr 15 14:25:59 2025 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 15 Apr 2025 14:25:59 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. You can check on the Arm website for specific CPU errata, but I will find out about any serious errata when they are discovered. The category of errata found on the A53 don't really occur or are mitigated with different means, so later generations haven't required mitigations in software. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2805397492 From mchevalier at openjdk.org Tue Apr 15 14:41:20 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 15 Apr 2025 14:41:20 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input Message-ID: When a BlackholeNode's control input becomes dead, the node is not removed causing the crash assert(!in->is_CFG()) failed: CFG Node with no controlling input? In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. ------------- Commit messages: - Format - Add test - remove_dead_region in Blackhole::Ideal Changes: https://git.openjdk.org/jdk/pull/24663/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24663&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344251 Stats: 83 lines in 4 files changed: 83 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24663/head:pull/24663 PR: https://git.openjdk.org/jdk/pull/24663 From aboldtch at openjdk.org Tue Apr 15 14:52:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 15 Apr 2025 14:52:58 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Looks good but need to communicate with JVMCI implementors. Also pre-exisiting but maybe `ZBarrierRelocationFormatLoadGoodAfterShl` should be called `ZBarrierRelocationFormatLoadGoodAfterShX` as we use it for both shr and shl. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.hpp line 52: > 50: #endif // COMPILER2 > 51: > 52: const int ZBarrierRelocationFormatLoadGoodAfterShl = 0; Suggestion: const int ZBarrierRelocationFormatLoadGoodAfterShl = 0; src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 223: > 221: return true; > 222: #if INCLUDE_ZGC > 223: case Z_BARRIER_RELOCATION_FORMAT_LOAD_GOOD_BEFORE_SHL: Should probably communicate with the JVMCI / Graal @dougxc so we can both update this exported symbol name to reflect the new behaviour, and give them the opportunity to adapt to the new relocation patching. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2768666320 PR Review Comment: https://git.openjdk.org/jdk/pull/24664#discussion_r2044778342 PR Review Comment: https://git.openjdk.org/jdk/pull/24664#discussion_r2044814373 From avoitylov at openjdk.org Tue Apr 15 14:52:57 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 15 Apr 2025 14:52:57 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 14:23:08 GMT, Stuart Monteith wrote: >> The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. >> ? >> Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. >> ? >> This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. >> >> The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. > > You can check on the Arm website for specific CPU errata, but I will find out about any serious errata when they are discovered. The category of errata found on the A53 don't really occur or are mitigated with different means, so later generations haven't required mitigations in software. @stooart-mon I did and must admit it was a painful experience. The conclusion I came to is that we need an Arm errata expert to look into OpenJDK codegen. I see that Arm carefully maintains GCC code base for that purpose, and are probably much better experts in this area. Patching there happens after checking if the conditions for the given erratum are met much narrower then what was done as part of, say, JDK-8079203, and much less clumsily. Specific CPU registers can be employed to check the need for patching. Developing good analogues to `-mfix-cortex-a57-aes-1742098, -mfix-cortex-a72-aes-1655431, -mfix-cortex-a53-835769, -mfix-cortex-a53-843419` would probably be a good start, but again, the full extent is much larger [1]. [1] Some Arm errata: A53: [https://developer.arm.com/documentation/epm048406/2100/?lang=en](https://documentation-service.arm.com/static/5fa29fddb209f547eebd361d?token=) A55: https://developer.arm.com/documentation/SDEN859338/1500/?lang=en A72: https://developer.arm.com/documentation/epm012079/11/?lang=en A75: https://developer.arm.com/documentation/SDEN859515/i/?lang=en A76: https://developer.arm.com/documentation/SDEN-885749/3200/?lang=en N1: https://developer.arm.com/documentation/SDEN885747/latest/ V1: https://developer.arm.com/documentation/SDEN1401781/latest/ V2: https://developer.arm.com/documentation/SDEN2332927/latest/ N2: https://developer.arm.com/documentation/SDEN1982442/latest/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2805697524 From shade at openjdk.org Tue Apr 15 15:15:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Apr 2025 15:15:51 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: Message-ID: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> On Tue, 15 Apr 2025 13:39:52 GMT, Marc Chevalier wrote: > When a BlackholeNode's control input becomes dead, the node is not removed causing the crash > > assert(!in->is_CFG()) failed: CFG Node with no controlling input? > > In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: > > > > > I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. Right. I should have done the similar thing from the day 1 :) src/hotspot/share/opto/cfgnode.cpp line 3114: > 3112: Node* BlackholeNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 3113: return remove_dead_region(phase, can_reshape) ? this : nullptr; > 3114: } I think you need a newline after this definition. test/hotspot/jtreg/compiler/blackhole/DeadBhElimination.java line 39: > 37: public static void main(String[] args) { > 38: TestFramework.runWithFlags( > 39: "-Xcomp", `-Xcomp` is likely too heavy-weight here. Other tests use `-XX:CompileThreshold=100`. I think that is enough for IR tests to compile the method. ------------- PR Review: https://git.openjdk.org/jdk/pull/24663#pullrequestreview-2768806614 PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2044868535 PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2044876117 From vpaprotski at openjdk.org Tue Apr 15 15:17:45 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Apr 2025 15:17:45 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts In-Reply-To: References: Message-ID: <7Lrz6RPTVMUzxHIvhQviysTP_W-Ux05QiBnEdyhu9uo=.fe6f0dac-87e1-41e1-bde9-6f28217e3da2@github.com> On Tue, 15 Apr 2025 11:37:43 GMT, Jatin Bhateja wrote: >> The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. >> >> Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: >> >> if (UseIntPolyIntrinsics) { >> StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); >> StubRoutines::_intpoly_assign = generate_intpoly_assign(); >> } > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 567: > >> 565: __ enter(); >> 566: >> 567: if (VM_Version::supports_avxifma()) { > > As per the latest architecture-instruction-set-extensions-programming-reference document, the upcoming Diamond Rapdis CPU has AVX-IFMA but its also a P-Core Xeon with 512 bit flavour of IFMA, do you think it is performant to generate montgomeryMultiplyAVX2 for it ? > > ![image](https://github.com/user-attachments/assets/f1247484-180d-4a65-9b79-745a388c0552) Thanks for spotting this! I was wondering about this if statement being 'future-proof'.. I think its best to just flip the order of the if to prefer the AVX512 when available (it will make the diff look bigger, which is why I originally avoided it..) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24644#discussion_r2044881182 From vpaprotski at openjdk.org Tue Apr 15 15:48:02 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Apr 2025 15:48:02 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v2] In-Reply-To: References: Message-ID: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> > The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. > > Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: > > if (UseIntPolyIntrinsics) { > StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); > StubRoutines::_intpoly_assign = generate_intpoly_assign(); > } Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: flip the direction of if to prefer AVX512 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24644/files - new: https://git.openjdk.org/jdk/pull/24644/files/0ad246c6..ee1b099f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24644&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24644&range=00-01 Stats: 19 lines in 1 file changed: 9 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24644.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24644/head:pull/24644 PR: https://git.openjdk.org/jdk/pull/24644 From kvn at openjdk.org Tue Apr 15 16:10:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Apr 2025 16:10:46 GMT Subject: RFR: 8354542: Clean up x86 stubs after 32-bit x86 removal In-Reply-To: References: Message-ID: <5jcS13OV4hEelw4daLlwCX8Wmd_yBD06DlYJrIJfJ3E=.4befbc7a-971c-4125-b268-7a85a4025531@github.com> On Mon, 14 Apr 2025 18:58:24 GMT, Aleksey Shipilev wrote: > This cleanup target various x86-specific stubs and related generators. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24633#pullrequestreview-2769010171 From kvn at openjdk.org Tue Apr 15 16:20:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Apr 2025 16:20:46 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:39:52 GMT, Marc Chevalier wrote: > When a BlackholeNode's control input becomes dead, the node is not removed causing the crash > > assert(!in->is_CFG()) failed: CFG Node with no controlling input? > > In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: > > > > > I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. C2 fix code is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24663#pullrequestreview-2769036860 From kvn at openjdk.org Tue Apr 15 16:20:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Apr 2025 16:20:47 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> References: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> Message-ID: On Tue, 15 Apr 2025 15:12:39 GMT, Aleksey Shipilev wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > test/hotspot/jtreg/compiler/blackhole/DeadBhElimination.java line 39: > >> 37: public static void main(String[] args) { >> 38: TestFramework.runWithFlags( >> 39: "-Xcomp", > > `-Xcomp` is likely too heavy-weight here. Other tests use `-XX:CompileThreshold=100`. I think that is enough for IR tests to compile the method. Or you can di separate runs with different flags, including default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2045005337 From shade at openjdk.org Tue Apr 15 16:21:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Apr 2025 16:21:45 GMT Subject: RFR: 8354542: Clean up x86 stubs after 32-bit x86 removal In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 18:58:24 GMT, Aleksey Shipilev wrote: > This cleanup target various x86-specific stubs and related generators. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Thanks! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24633#issuecomment-2806712493 From shade at openjdk.org Tue Apr 15 16:21:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Apr 2025 16:21:45 GMT Subject: Integrated: 8354542: Clean up x86 stubs after 32-bit x86 removal In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 18:58:24 GMT, Aleksey Shipilev wrote: > This cleanup target various x86-specific stubs and related generators. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: cec48ed2 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cec48ed270d3bdf704c389a091b42a32c2ed6440 Stats: 176 lines in 6 files changed: 0 ins; 67 del; 109 mod 8354542: Clean up x86 stubs after 32-bit x86 removal Reviewed-by: adinn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24633 From mchevalier at openjdk.org Tue Apr 15 16:36:59 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 15 Apr 2025 16:36:59 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> Message-ID: On Tue, 15 Apr 2025 16:17:34 GMT, Vladimir Kozlov wrote: >> test/hotspot/jtreg/compiler/blackhole/DeadBhElimination.java line 39: >> >>> 37: public static void main(String[] args) { >>> 38: TestFramework.runWithFlags( >>> 39: "-Xcomp", >> >> `-Xcomp` is likely too heavy-weight here. Other tests use `-XX:CompileThreshold=100`. I think that is enough for IR tests to compile the method. > > Or you can di separate runs with different flags, including default. I've added the `-Xcomp` flag because the dead `if` branch was compiled into an uncommon trap (reason = unstable_if), and I had no `BlackholeNode`. Looking at the code emitting the trap, it seems to come from some profiling information, but that the generation of this uncommon trap is explicitly disabled if `-Xcomp` is given. I am not entirely sure why, but `-XX:CompileThreshold=100` does the job to make the compilation of the dead `if` branch happen as well. But if I give no such flag, the test fails for lack of a `BlackholeNode` after parsing (also meaning that the test doesn't exercise the deletion of the `BlackholeNode`). I'm currently testing further with `-XX:CompileThreshold=100` as suggested (it works on my laptop so far), but if you have a more canonical trick to inhibit the uncommon trap in the dead branch, I'd take happily! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2045039256 From shade at openjdk.org Tue Apr 15 16:41:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Apr 2025 16:41:45 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> Message-ID: On Tue, 15 Apr 2025 16:34:10 GMT, Marc Chevalier wrote: >> Or you can di separate runs with different flags, including default. > > I've added the `-Xcomp` flag because the dead `if` branch was compiled into an uncommon trap (reason = unstable_if), and I had no `BlackholeNode`. Looking at the code emitting the trap, it seems to come from some profiling information, but that the generation of this uncommon trap is explicitly disabled if `-Xcomp` is given. I am not entirely sure why, but `-XX:CompileThreshold=100` does the job to make the compilation of the dead `if` branch happen as well. But if I give no such flag, the test fails for lack of a `BlackholeNode` after parsing (also meaning that the test doesn't exercise the deletion of the `BlackholeNode`). > > I'm currently testing further with `-XX:CompileThreshold=100` as suggested (it works on my laptop so far), but if you have a more canonical trick to inhibit the uncommon trap in the dead branch, I'd take happily! Last time I had this problem, I did this: https://github.com/openjdk/jdk/blob/cec48ed270d3bdf704c389a091b42a32c2ed6440/test/hotspot/jtreg/compiler/c2/irTests/gc/ReferenceClearTests.java#L48-L49 https://github.com/openjdk/jdk/blob/cec48ed270d3bdf704c389a091b42a32c2ed6440/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java#L303-L307 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2045045847 From mchevalier at openjdk.org Tue Apr 15 17:32:46 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 15 Apr 2025 17:32:46 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> Message-ID: On Tue, 15 Apr 2025 16:38:29 GMT, Aleksey Shipilev wrote: >> I've added the `-Xcomp` flag because the dead `if` branch was compiled into an uncommon trap (reason = unstable_if), and I had no `BlackholeNode`. Looking at the code emitting the trap, it seems to come from some profiling information, but that the generation of this uncommon trap is explicitly disabled if `-Xcomp` is given. I am not entirely sure why, but `-XX:CompileThreshold=100` does the job to make the compilation of the dead `if` branch happen as well. But if I give no such flag, the test fails for lack of a `BlackholeNode` after parsing (also meaning that the test doesn't exercise the deletion of the `BlackholeNode`). >> >> I'm currently testing further with `-XX:CompileThreshold=100` as suggested (it works on my laptop so far), but if you have a more canonical trick to inhibit the uncommon trap in the dead branch, I'd take happily! > > Last time I had this problem, I did this: > https://github.com/openjdk/jdk/blob/cec48ed270d3bdf704c389a091b42a32c2ed6440/test/hotspot/jtreg/compiler/c2/irTests/gc/ReferenceClearTests.java#L48-L49 > > https://github.com/openjdk/jdk/blob/cec48ed270d3bdf704c389a091b42a32c2ed6440/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java#L303-L307 That looks exactly like what I'm trying to achieve. I'll re-test with that. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2045135410 From vlivanov at openjdk.org Tue Apr 15 17:46:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Apr 2025 17:46:46 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 09:29:32 GMT, Xiaohong Gong wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - Reviews and Float64Vector-related fix >> - Misc fixes and cleanups >> - CPU features support >> - Cleanup >> - TODO list >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3a706abc...0ffed12f > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 198: > >> 196: if (vspecies.vectorBitSize() < 128) { >> 197: return false; // 64-bit vectors are not supported >> 198: } > > Thanks for your refactor. It's really a good job! > > It seems float type support 64-bit vector operations before (see https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L9835). Will this change the behavior of 64-bit float vector? Thanks! Thanks for taking a look. In the latest version, Float64Vector cases should be properly handled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2045156970 From vlivanov at openjdk.org Tue Apr 15 17:53:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Apr 2025 17:53:15 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> Message-ID: On Tue, 15 Apr 2025 17:30:09 GMT, Marc Chevalier wrote: >> Last time I had this problem, I did this: >> https://github.com/openjdk/jdk/blob/cec48ed270d3bdf704c389a091b42a32c2ed6440/test/hotspot/jtreg/compiler/c2/irTests/gc/ReferenceClearTests.java#L48-L49 >> >> https://github.com/openjdk/jdk/blob/cec48ed270d3bdf704c389a091b42a32c2ed6440/test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java#L303-L307 > > That looks exactly like what I'm trying to achieve. I'll re-test with that. Thanks! > -Xcomp is likely too heavy-weight here. FTR another alternative is to keep `-Xcomp`, but limit compilation to test methods (`-XX:CompileCommand=compileonly,`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2045166262 From sparasa at openjdk.org Tue Apr 15 21:14:41 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Apr 2025 21:14:41 GMT Subject: RFR: 8354544: Fix bugs in APX NDD code generation for OpenJDK PR [v2] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 06:22:30 GMT, Tobias Hartmann wrote: > That looks good to me. > > I converted the sub-task to a bug. Please use a more descriptive title and add the failure modes to the description of the JBS issue. Thanks! Hi Tobias (@TobiHartmann), Thank you for the suggestions and approval! The description for the JBS entry and this PR were updated to show a representative failure mode. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24637#issuecomment-2807524225 From sparasa at openjdk.org Tue Apr 15 21:14:42 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Apr 2025 21:14:42 GMT Subject: RFR: 8354544: Fix bugs in APX NDD code generation for OpenJDK PR [v2] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 11:26:25 GMT, Jatin Bhateja wrote: > Looks good to me. > > Best Regards, Jatin Thank You Jatin for approving the PR! Regards, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24637#issuecomment-2807526550 From duke at openjdk.org Tue Apr 15 21:14:43 2025 From: duke at openjdk.org (duke) Date: Tue, 15 Apr 2025 21:14:43 GMT Subject: RFR: 8354544: Fix bugs in APX NDD code generation for OpenJDK PR [v2] In-Reply-To: References: Message-ID: <8-pddyN6yfWzNY7zlywPiatQeCAd_yWlthcCM0lZqXc=.3e688338-57a1-4359-ac55-fe5f440b571e@github.com> On Mon, 14 Apr 2025 23:56:53 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the bugs discovered in the APX NDD code generation PR (#23501) that got integrated into OpenJDK. >> >> The following compiler tests uncovered these bugs: >> >> test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java >> test/hotspot/jtreg/compiler/c2/cr7192963/TestLongVect.java >> test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java >> test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java >> >> After the bug fixes in this PR, **using Intel Software Development Emulator (SDE) it was verified that the above tests are working correctly when using Intel APX.** >> >> Below is a representative error for the TEST: compiler/c2/cr6340864/TestLongVect.java >> STDERR: >> test_xorc: [0] = 9223372032566684971 != -7818520746908513990 >> test_xorc: [1] = 9223372032566684972 != -7818520746908513987 >> test_xorc: [994] = -9223372032566684979 != 7818520746908514012 >> test_xorc: [995] = -9223372032566684978 != 7818520746908514015 >> test_xorc: [996] = -9223372032566684977 != 7818520746908514014 >> test_xorv: [0] = 9223372032566684971 != -7818520746908513990 >> test_xorv: [1] = 9223372032566684972 != -7818520746908513987 >> test_xorv: [994] = -9223372032566684979 != 7818520746908514012 >> test_xorv: [995] = -9223372032566684978 != 7818520746908514015 >> test_xorv: [996] = -9223372032566684977 != 7818520746908514014 >> test_xora: [0] = 9223372032566684971 != -7818520746908513990 >> test_xora: [1] = 9223372032566684972 != -7818520746908513987 >> test_xora: [2] = 9223372032566684973 != -7818520746908513988 >> test_xora: [3] = 9223372032566684974 != -7818520746908513985 >> test_xora: [4] = 9223372032566684975 != -7818520746908513986 >> FAILED: 15 errors >> TEST RESULT: Failed. Execution failed: Execution failed > > Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'apxfix' of https://github.com/vamsi-parasa/jdk into apxfix > - 8354544: Fix bugs in APX NDD code generation for OpenJDK PR @vamsi-parasa Your change (at version b1a47149acb871bc2e89de9f716830cb422f8bd0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24637#issuecomment-2807529848 From sparasa at openjdk.org Tue Apr 15 21:27:49 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Apr 2025 21:27:49 GMT Subject: Integrated: 8354544: Fix bugs in increment and xor APX codegen In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 23:25:14 GMT, Srinivas Vamsi Parasa wrote: > This PR fixes the bugs discovered in the APX NDD code generation PR (#23501) that got integrated into OpenJDK. > > The following compiler tests uncovered these bugs: > > test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java > test/hotspot/jtreg/compiler/c2/cr7192963/TestLongVect.java > test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java > test/hotspot/jtreg/compiler/vectorapi/TestMaskedMacroLogicVector.java > > After the bug fixes in this PR, **using Intel Software Development Emulator (SDE) it was verified that the above tests are working correctly when using Intel APX.** > > Below is a representative error for the TEST: compiler/c2/cr6340864/TestLongVect.java > STDERR: > test_xorc: [0] = 9223372032566684971 != -7818520746908513990 > test_xorc: [1] = 9223372032566684972 != -7818520746908513987 > test_xorc: [994] = -9223372032566684979 != 7818520746908514012 > test_xorc: [995] = -9223372032566684978 != 7818520746908514015 > test_xorc: [996] = -9223372032566684977 != 7818520746908514014 > test_xorv: [0] = 9223372032566684971 != -7818520746908513990 > test_xorv: [1] = 9223372032566684972 != -7818520746908513987 > test_xorv: [994] = -9223372032566684979 != 7818520746908514012 > test_xorv: [995] = -9223372032566684978 != 7818520746908514015 > test_xorv: [996] = -9223372032566684977 != 7818520746908514014 > test_xora: [0] = 9223372032566684971 != -7818520746908513990 > test_xora: [1] = 9223372032566684972 != -7818520746908513987 > test_xora: [2] = 9223372032566684973 != -7818520746908513988 > test_xora: [3] = 9223372032566684974 != -7818520746908513985 > test_xora: [4] = 9223372032566684975 != -7818520746908513986 > FAILED: 15 errors > TEST RESULT: Failed. Execution failed: Execution failed This pull request has now been integrated. Changeset: 513c4650 Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/513c4650c51aa435f04fb0aaf495134259042118 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8354544: Fix bugs in increment and xor APX codegen Reviewed-by: thartmann, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/24637 From sviswanathan at openjdk.org Tue Apr 15 21:34:42 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 15 Apr 2025 21:34:42 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v2] In-Reply-To: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> References: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> Message-ID: On Tue, 15 Apr 2025 15:48:02 GMT, Volodymyr Paprotski wrote: >> The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. >> >> Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: >> >> if (UseIntPolyIntrinsics) { >> StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); >> StubRoutines::_intpoly_assign = generate_intpoly_assign(); >> } > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > flip the direction of if to prefer AVX512 Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24644#pullrequestreview-2769942677 From psandoz at openjdk.org Wed Apr 16 00:30:58 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 16 Apr 2025 00:30:58 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 21:23:52 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - Reviews and Float64Vector-related fix > - Misc fixes and cleanups > - CPU features support > - Cleanup > - TODO list > - ... and 9 more: https://git.openjdk.org/jdk/compare/cf1ff745...0ffed12f src/hotspot/share/opto/vectorIntrinsics.cpp line 488: > 486: // V binaryOp(long address, Class vClass, Class elementType, int length, > 487: // V v1, V v2, > 488: // BinaryOperation defaultImpl) `debugName` parameter is missing src/hotspot/share/opto/vectorIntrinsics.cpp line 555: > 553: > 554: const char* debug_name = ""; > 555: const TypeInstPtr* debug_name_oop = gvn().type(argument(8))->isa_instptr(); Should that be: const TypeInstPtr* debug_name_oop = gvn().type(argument(6 + arity))->isa_instptr(); ? Placing the `debugName` parameter before the vector parameters makes it easier to reason about IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2045762528 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2045767847 From xgong at openjdk.org Wed Apr 16 01:40:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 16 Apr 2025 01:40:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 17:43:52 GMT, Vladimir Ivanov wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 198: >> >>> 196: if (vspecies.vectorBitSize() < 128) { >>> 197: return false; // 64-bit vectors are not supported >>> 198: } >> >> Thanks for your refactor. It's really a good job! >> >> It seems float type support 64-bit vector operations before (see https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L9835). Will this change the behavior of 64-bit float vector? Thanks! > > Thanks for taking a look. > > In the latest version, Float64Vector cases should be properly handled. Oh, yes. That's right! Thanks and sorry for my distraction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2045837608 From xgong at openjdk.org Wed Apr 16 01:51:45 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 16 Apr 2025 01:51:45 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 00:20:07 GMT, Paul Sandoz wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - Reviews and Float64Vector-related fix >> - Misc fixes and cleanups >> - CPU features support >> - Cleanup >> - TODO list >> - ... and 9 more: https://git.openjdk.org/jdk/compare/f025c30a...0ffed12f > > src/hotspot/share/opto/vectorIntrinsics.cpp line 488: > >> 486: // V binaryOp(long address, Class vClass, Class elementType, int length, >> 487: // V v1, V v2, >> 488: // BinaryOperation defaultImpl) > > `debugName` parameter is missing It seems the function should be updated to `libraryBinaryOp` and `libraryUnaryOp`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2045846887 From dlong at openjdk.org Wed Apr 16 02:01:49 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Apr 2025 02:01:49 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. This looks OK, but we could do better. Instead of making the relocation point to the end of the instruction and then looking up the offset with patch_barrier_relocation_offset(), why not make the offset always 0 and have the relocation point to the data offset inside the instruction? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2807988702 From xgong at openjdk.org Wed Apr 16 02:16:55 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 16 Apr 2025 02:16:55 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 21:23:52 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - Reviews and Float64Vector-related fix > - Misc fixes and cleanups > - CPU features support > - Cleanup > - TODO list > - ... and 9 more: https://git.openjdk.org/jdk/compare/d3c46607...0ffed12f src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 240: > 238: if (isAARCH64() && vspecies.vectorBitSize() > 128) { > 239: return false; // FIXME: SVE support only for MAX shapes > 240: } SVE also supports operations for partial vector size, which means the vector size is smaller than MAX size. For example, if the max vector size of a SVE architecture is 512-bit, the FP vector operations with `VectorShape.S_256_BIT` are also supported. They are implemented with the same scalable math functions in SLEEF. Hence, I think this check and the assertion in line-198 can be removed. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2045871389 From mchevalier at openjdk.org Wed Apr 16 06:29:25 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 16 Apr 2025 06:29:25 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: References: Message-ID: > When a BlackholeNode's control input becomes dead, the node is not removed causing the crash > > assert(!in->is_CFG()) failed: CFG Node with no controlling input? > > In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: > > > > > I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - Using PerMethod(Spec)?TrapLimit - Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24663/files - new: https://git.openjdk.org/jdk/pull/24663/files/a71abe7d..487b1a64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24663&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24663&range=00-01 Stats: 6 lines in 2 files changed: 5 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24663/head:pull/24663 PR: https://git.openjdk.org/jdk/pull/24663 From mchevalier at openjdk.org Wed Apr 16 06:32:47 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 16 Apr 2025 06:32:47 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> References: <82rcKkqePP7jac3TRbiYX0FrKxli8XNId8EV27KS_-c=.2fd29726-325d-406b-af19-a54817bc2b63@github.com> Message-ID: On Tue, 15 Apr 2025 15:09:22 GMT, Aleksey Shipilev wrote: >> Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: >> >> - Using PerMethod(Spec)?TrapLimit >> - Address review comments > > src/hotspot/share/opto/cfgnode.cpp line 3114: > >> 3112: Node* BlackholeNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 3113: return remove_dead_region(phase, can_reshape) ? this : nullptr; >> 3114: } > > I think you need a newline after this definition. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2046176692 From duke at openjdk.org Wed Apr 16 06:45:18 2025 From: duke at openjdk.org (erifan) Date: Wed, 16 Apr 2025 06:45:18 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare Message-ID: This patch optimizes the following patterns: For integer types: (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 With option `-XX:UseSVE=0`: Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 13076858.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 The small amount of performance degradation is due to test fluctuations. ------------- Commit messages: - 8354242: VectorAPI: combine vector not operation with compare Changes: https://git.openjdk.org/jdk/pull/24674/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354242 Stats: 1051 lines in 5 files changed: 1048 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From jbhateja at openjdk.org Wed Apr 16 07:52:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 07:52:09 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 01:58:53 GMT, Dean Long wrote: > This looks OK, but we could do better. Instead of making the relocation point to the end of the instruction and then looking up the offset with patch_barrier_relocation_offset(), why not make the offset always 0 and have the relocation point to the data offset inside the instruction? Hi @dean-long , As of now, barrier relocations are placed either before[1] or after[2] the instructions, offset is then added to compute the effective address of the patch site. I think you are suggesting to extend the barrier structure itself to cache the patch site address. For this bug fix PR I intend to make the patch offset agnostic to REX/REX2 prefix without disturbing the existing implimentation. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L394 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L397 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2808697302 From jbhateja at openjdk.org Wed Apr 16 07:52:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 07:52:09 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: review comment resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24664/files - new: https://git.openjdk.org/jdk/pull/24664/files/1a5a73c0..ffd92c37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From shade at openjdk.org Wed Apr 16 08:10:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 16 Apr 2025 08:10:48 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 06:29:25 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Using PerMethod(Spec)?TrapLimit > - Address review comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24663#pullrequestreview-2771577084 From aturbanov at openjdk.org Wed Apr 16 08:34:50 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 16 Apr 2025 08:34:50 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 10 Apr 2025 15:56:25 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > TestIterativeEA fix test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 69: > 67: > 68: double sum = 0; > 69: for(int i = 0; i < arr1.length; ++i) { Suggestion: for (int i = 0; i < arr1.length; ++i) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2046392320 From thartmann at openjdk.org Wed Apr 16 09:04:48 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Apr 2025 09:04:48 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 06:29:25 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Using PerMethod(Spec)?TrapLimit > - Address review comments Great work extracting a standalone reproducer for this and narrowing it down! The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24663#pullrequestreview-2771746027 From duke at openjdk.org Wed Apr 16 09:08:46 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 16 Apr 2025 09:08:46 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v9] In-Reply-To: References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> Message-ID: On Tue, 15 Apr 2025 07:43:31 GMT, Fei Yang wrote: >> Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8329887 >> - modify code and test format >> - fix test bug >> - add aarch64 judgement for the test >> - fix zvbb mask match rule >> - Merge branch 'openjdk:master' into JDK-8329887 >> - add vand_not_masked test >> - add vand_not_L test >> - Merge branch 'openjdk:master' into JDK-8329887 >> - RISC-V: C2: Support Zvbb Vector And-Not instruction >> >> fix match rule for format >> - ... and 1 more: https://git.openjdk.org/jdk/compare/62c1559b...66c886c6 > > Updated change looks good. Thanks! @RealFYang @feilongjiang Thanks for the review and approve ------------- PR Comment: https://git.openjdk.org/jdk/pull/24129#issuecomment-2808904831 From thartmann at openjdk.org Wed Apr 16 09:43:39 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Apr 2025 09:43:39 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 21:46:26 GMT, Saranya Natarajan wrote: >> Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. >> >> Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24427#pullrequestreview-2771789253 From mchevalier at openjdk.org Wed Apr 16 09:45:13 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 16 Apr 2025 09:45:13 GMT Subject: RFR: 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter Message-ID: Remove the second (unused) parameter of `Compile::igv_print_graph_to_network`. It also makes it more consistent with `Compile::igv_print_method_to_file` that always print from root. ------------- Commit messages: - Remove unused parameter Changes: https://git.openjdk.org/jdk/pull/24675/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24675&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354625 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24675.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24675/head:pull/24675 PR: https://git.openjdk.org/jdk/pull/24675 From rcastanedalo at openjdk.org Wed Apr 16 10:11:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 16 Apr 2025 10:11:49 GMT Subject: RFR: 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:00:59 GMT, Marc Chevalier wrote: > Remove the second (unused) parameter of `Compile::igv_print_graph_to_network`. It also makes it more consistent with `Compile::igv_print_method_to_file` that always print from root. Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24675#pullrequestreview-2771917742 From swen at openjdk.org Wed Apr 16 10:11:56 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 16 Apr 2025 10:11:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). PR #22919 requires this PR's MergeLoad to eliminate the use of Unsafe. @eme64 Could you spare some time to help continue the review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2809055927 From rcastanedalo at openjdk.org Wed Apr 16 10:12:42 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 16 Apr 2025 10:12:42 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v2] In-Reply-To: References: Message-ID: <47vh9Osg1OXi4V1wcJUOI98XS_4KxNVq5d3SQUeniLI=.727bbcd0-76cf-4bb2-b0c1-77dcbb7978fe@github.com> On Mon, 14 Apr 2025 21:46:26 GMT, Saranya Natarajan wrote: >> Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. >> >> Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24427#pullrequestreview-2771922241 From thartmann at openjdk.org Wed Apr 16 10:35:47 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Apr 2025 10:35:47 GMT Subject: RFR: 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:00:59 GMT, Marc Chevalier wrote: > Remove the second (unused) parameter of `Compile::igv_print_graph_to_network`. It also makes it more consistent with `Compile::igv_print_method_to_file` that always print from root. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24675#pullrequestreview-2772035289 From dfuchs at openjdk.org Wed Apr 16 11:30:01 2025 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Wed, 16 Apr 2025 11:30:01 GMT Subject: Integrated: Merge ed30fce6df57b1cbf7a6efebabc3558550f8ec16 In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 11:20:36 GMT, Jaikiran Pai wrote: > This brings in the CPU25_04 changes into the master branch. LGTM ------------- Marked as reviewed by dfuchs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24683#pullrequestreview-2772164693 From jpai at openjdk.org Wed Apr 16 11:30:01 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 16 Apr 2025 11:30:01 GMT Subject: Integrated: Merge ed30fce6df57b1cbf7a6efebabc3558550f8ec16 Message-ID: This brings in the CPU25_04 changes into the master branch. ------------- Commit messages: The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/jdk/pull/24683/files Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24683/head:pull/24683 PR: https://git.openjdk.org/jdk/pull/24683 From jpai at openjdk.org Wed Apr 16 11:30:02 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 16 Apr 2025 11:30:02 GMT Subject: Integrated: Merge ed30fce6df57b1cbf7a6efebabc3558550f8ec16 In-Reply-To: References: Message-ID: <5aoSWv3V3dVlk0FcmcAt3FpvLoE6CxWrTpYmPU5wyfw=.99fdb614-63cc-4cef-8442-b9d55f328d37@github.com> On Wed, 16 Apr 2025 11:20:36 GMT, Jaikiran Pai wrote: > This brings in the CPU25_04 changes into the master branch. Thank you Daniel for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24683#issuecomment-2809280611 From jpai at openjdk.org Wed Apr 16 11:30:02 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 16 Apr 2025 11:30:02 GMT Subject: Integrated: Merge ed30fce6df57b1cbf7a6efebabc3558550f8ec16 In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 11:20:36 GMT, Jaikiran Pai wrote: > This brings in the CPU25_04 changes into the master branch. This pull request has now been integrated. Changeset: c6243fc2 Author: Jaikiran Pai URL: https://git.openjdk.org/jdk/commit/c6243fc27fafb1ff89f8610ead3acd87030caf95 Stats: 301 lines in 11 files changed: 223 ins; 25 del; 53 mod Merge Reviewed-by: dfuchs ------------- PR: https://git.openjdk.org/jdk/pull/24683 From mchevalier at openjdk.org Wed Apr 16 11:49:46 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 16 Apr 2025 11:49:46 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 06:29:25 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Using PerMethod(Spec)?TrapLimit > - Address review comments Thanks @shipilev, @vnkozlov and @TobiHartmann for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24663#issuecomment-2809324664 From duke at openjdk.org Wed Apr 16 11:49:46 2025 From: duke at openjdk.org (duke) Date: Wed, 16 Apr 2025 11:49:46 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 06:29:25 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Using PerMethod(Spec)?TrapLimit > - Address review comments @marc-chevalier Your change (at version 487b1a64cc4dc3fd1495a1e6ced53f974a8d0a1e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24663#issuecomment-2809327493 From duke at openjdk.org Wed Apr 16 12:39:48 2025 From: duke at openjdk.org (Hendrik Schick) Date: Wed, 16 Apr 2025 12:39:48 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 06:29:25 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Using PerMethod(Spec)?TrapLimit > - Address review comments test/hotspot/jtreg/compiler/blackhole/DeadBhElimination.java line 64: > 62: // loop is detected as empty loop > 63: > 64: if(b == 78) { // dead Suggestion: if (b == 78) { // dead ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24663#discussion_r2046834769 From mchevalier at openjdk.org Wed Apr 16 13:01:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 16 Apr 2025 13:01:12 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v3] In-Reply-To: References: Message-ID: > When a BlackholeNode's control input becomes dead, the node is not removed causing the crash > > assert(!in->is_CFG()) failed: CFG Node with no controlling input? > > In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: > > > > > I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: cosmetic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24663/files - new: https://git.openjdk.org/jdk/pull/24663/files/487b1a64..9f000990 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24663&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24663&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24663/head:pull/24663 PR: https://git.openjdk.org/jdk/pull/24663 From shade at openjdk.org Wed Apr 16 13:40:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 16 Apr 2025 13:40:57 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v3] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 13:01:12 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic Still good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24663#pullrequestreview-2772574984 From duke at openjdk.org Wed Apr 16 14:08:21 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 16 Apr 2025 14:08:21 GMT Subject: RFR: 8354815: RISC-V: Change type of bitwise rotation shift to iRegIorL2I Message-ID: There is no need to do a type conversion when the shift amount of bitwise rotation is an integer converted from long (ConvL2I). There reason is that these instruction performs a rotate right/left of source by the amount in the least-significant 5/6 bits of the shift amount depending on the width of the operation (32-bit/64-bit). For 32-bit operations, the resulting 32-bit value is sign-extended by copying bit 31 to all of the more-significant bits. This means that we could use iRegIorL2I type for source for these 32-bit operations as well. Jtreg Testing in progress ------------- Commit messages: - RISC-V: Optimize zbb rol/ror iRegI to iRegIorL2I Changes: https://git.openjdk.org/jdk/pull/24618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24618&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354815 Stats: 27 lines in 3 files changed: 14 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24618/head:pull/24618 PR: https://git.openjdk.org/jdk/pull/24618 From vpaprotski at openjdk.org Wed Apr 16 15:44:57 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Apr 2025 15:44:57 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v2] In-Reply-To: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> References: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> Message-ID: On Tue, 15 Apr 2025 15:48:02 GMT, Volodymyr Paprotski wrote: >> The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. >> >> Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: >> >> if (UseIntPolyIntrinsics) { >> StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); >> StubRoutines::_intpoly_assign = generate_intpoly_assign(); >> } > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > flip the direction of if to prefer AVX512 If there are no objections, will integrate later today ------------- PR Comment: https://git.openjdk.org/jdk/pull/24644#issuecomment-2809988058 From jbhateja at openjdk.org Wed Apr 16 16:19:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 16:19:46 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v2] In-Reply-To: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> References: <8lSj6K73YW9ioeKmMALCtag-RjPzkvB9n5ms5wOu3GQ=.ef6436bc-c93a-4279-9674-0c873b8d9d09@github.com> Message-ID: <2t6kbtfQxPzj5NqaiC1JQ3Yk7P6-vZkIrwTjbBdksB4=.6e552990-e48e-4272-838f-6aa51dc79620@github.com> On Tue, 15 Apr 2025 15:48:02 GMT, Volodymyr Paprotski wrote: >> The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. >> >> Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: >> >> if (UseIntPolyIntrinsics) { >> StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); >> StubRoutines::_intpoly_assign = generate_intpoly_assign(); >> } > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > flip the direction of if to prefer AVX512 Minor comment modification suggestion. Otherwise looks good. Marked as reviewed by jbhateja (Reviewer). src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 576: > 574: montgomeryMultiply(aLimbs, bLimbs, rLimbs, tmp, _masm); > 575: } else { > 576: assert(VM_Version::supports_avxifma(), "Require AVXIFMA support"); Suggestion: assert(VM_Version::supports_avxifma(), "Require AVX_IFMA support"); ------------- PR Review: https://git.openjdk.org/jdk/pull/24644#pullrequestreview-2773099125 PR Review: https://git.openjdk.org/jdk/pull/24644#pullrequestreview-2773107873 PR Review Comment: https://git.openjdk.org/jdk/pull/24644#discussion_r2047277287 From kxu at openjdk.org Wed Apr 16 16:20:47 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 16 Apr 2025 16:20:47 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: Message-ID: On Sat, 12 Apr 2025 04:07:42 GMT, Cesar Soares Lucas wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> reviewer suggested changes > > src/hotspot/share/opto/loopnode.hpp line 267: > >> 265: const TypeInteger* trunc_type = nullptr; >> 266: }; >> 267: static TruncatedIncrement match_incr_with_optional_truncation(Node* expr, BasicType bt); > > Just a drive-by comment: this function and some others that you created are returning a value by copy, for performance reasons it may be better to return a reference or even a pointer as usually is the case in HotSpot. I'm not sure if this is always the case but please correct me if I'm wrong: a cpp compiler already optimizes these kinds of patterns and structs are only allocated on the caller stack frame with a pointer to it passed into callee as an argument (or even leaving fields in registers if the struct is small enough). No copying is actually happening. I agree this *syntactically looks costly*, please let me know if you find it necessary to switch to passing in a struct reference as an argument. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2047284718 From vpaprotski at openjdk.org Wed Apr 16 16:46:13 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Apr 2025 16:46:13 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v3] In-Reply-To: References: Message-ID: <7dJi5Ocxadsis5Gvd5qrwM8ufBCIWS6LQ5RkV7FG7Ig=.3cac7c36-a47d-44fb-9d06-463e9333f8a5@github.com> > The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. > > Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: > > if (UseIntPolyIntrinsics) { > StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); > StubRoutines::_intpoly_assign = generate_intpoly_assign(); > } Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp Co-authored-by: Jatin Bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24644/files - new: https://git.openjdk.org/jdk/pull/24644/files/ee1b099f..60f00320 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24644&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24644&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24644.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24644/head:pull/24644 PR: https://git.openjdk.org/jdk/pull/24644 From sviswanathan at openjdk.org Wed Apr 16 17:52:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Apr 2025 17:52:52 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v3] In-Reply-To: <7dJi5Ocxadsis5Gvd5qrwM8ufBCIWS6LQ5RkV7FG7Ig=.3cac7c36-a47d-44fb-9d06-463e9333f8a5@github.com> References: <7dJi5Ocxadsis5Gvd5qrwM8ufBCIWS6LQ5RkV7FG7Ig=.3cac7c36-a47d-44fb-9d06-463e9333f8a5@github.com> Message-ID: On Wed, 16 Apr 2025 16:46:13 GMT, Volodymyr Paprotski wrote: >> The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. >> >> Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: >> >> if (UseIntPolyIntrinsics) { >> StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); >> StubRoutines::_intpoly_assign = generate_intpoly_assign(); >> } > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp > > Co-authored-by: Jatin Bhateja Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24644#pullrequestreview-2773367498 From vlivanov at openjdk.org Wed Apr 16 18:28:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Apr 2025 18:28:52 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 02:11:09 GMT, Xiaohong Gong wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - Reviews and Float64Vector-related fix >> - Misc fixes and cleanups >> - CPU features support >> - Cleanup >> - TODO list >> - ... and 9 more: https://git.openjdk.org/jdk/compare/e269cc66...0ffed12f > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 240: > >> 238: if (isAARCH64() && vspecies.vectorBitSize() > 128) { >> 239: return false; // FIXME: SVE support only for MAX shapes >> 240: } > > SVE also supports operations for partial vector size, which means the vector size is smaller than MAX size. For example, if the max vector size of a SVE architecture is 512-bit, the FP vector operations with `VectorShape.S_256_BIT` are also supported. They are implemented with the same scalable math functions in SLEEF. > > Hence, I think this check and the assertion in line-198 can be removed. Thanks! How does it work now? The code in `generate_vector_math_stubs()` in `stubGenerator_aarch64.cpp` only populates `VEC_SIZE_SCALABLE` shapes with SVE versions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2047489059 From duke at openjdk.org Wed Apr 16 18:50:43 2025 From: duke at openjdk.org (duke) Date: Wed, 16 Apr 2025 18:50:43 GMT Subject: RFR: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts [v3] In-Reply-To: <7dJi5Ocxadsis5Gvd5qrwM8ufBCIWS6LQ5RkV7FG7Ig=.3cac7c36-a47d-44fb-9d06-463e9333f8a5@github.com> References: <7dJi5Ocxadsis5Gvd5qrwM8ufBCIWS6LQ5RkV7FG7Ig=.3cac7c36-a47d-44fb-9d06-463e9333f8a5@github.com> Message-ID: <5LfGtVCltaTQtsrM4ke2G7eY4Nv_g26lDVJQ7lAt9TA=.f7105574-2b0f-45f9-816f-bba2cab753e4@github.com> On Wed, 16 Apr 2025 16:46:13 GMT, Volodymyr Paprotski wrote: >> The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. >> >> Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: >> >> if (UseIntPolyIntrinsics) { >> StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); >> StubRoutines::_intpoly_assign = generate_intpoly_assign(); >> } > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp > > Co-authored-by: Jatin Bhateja @vpaprotsk Your change (at version 60f003208d4547640798aed991f6eb3fe49ad18e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24644#issuecomment-2810435734 From vlivanov at openjdk.org Wed Apr 16 19:21:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Apr 2025 19:21:25 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v6] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Fix debugName handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/0ffed12f..84d02cb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=04-05 Stats: 45 lines in 4 files changed: 7 ins; 9 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Wed Apr 16 19:21:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Apr 2025 19:21:25 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: <4q-SatFOWofWAIyxd58Q9O-mJ0C53Y43jKsXCVsdpiY=.a799347a-7399-4d15-9c51-01dc906b5912@github.com> On Wed, 16 Apr 2025 01:47:02 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 488: >> >>> 486: // V binaryOp(long address, Class vClass, Class elementType, int length, >>> 487: // V v1, V v2, >>> 488: // BinaryOperation defaultImpl) >> >> `debugName` parameter is missing > > It seems the function should be updated to `libraryBinaryOp` and `libraryUnaryOp`? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2047597002 From vlivanov at openjdk.org Wed Apr 16 19:21:28 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Apr 2025 19:21:28 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 00:25:59 GMT, Paul Sandoz wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - Reviews and Float64Vector-related fix >> - Misc fixes and cleanups >> - CPU features support >> - Cleanup >> - TODO list >> - ... and 9 more: https://git.openjdk.org/jdk/compare/b0b76fc4...0ffed12f > > src/hotspot/share/opto/vectorIntrinsics.cpp line 555: > >> 553: >> 554: const char* debug_name = ""; >> 555: const TypeInstPtr* debug_name_oop = gvn().type(argument(8))->isa_instptr(); > > Should that be: > > const TypeInstPtr* debug_name_oop = gvn().type(argument(6 + arity))->isa_instptr(); > > ? > Placing the `debugName` parameter before the vector parameters makes it easier to reason about IMO. Good point. Initially, I intended to keep `debugName` optional, but I don't see why we can't require its presence as other constants. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2047598327 From vlivanov at openjdk.org Wed Apr 16 19:29:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Apr 2025 19:29:07 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v7] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into vector.math.01.java - Fix debugName handling - Merge branch 'master' into vector.math.01.java - RVV and SVE adjustments - Merge branch 'master' into vector.math.01.java - Fix windows-aarch64 build failure - features_string -> cpu_info_string - Reviews and Float64Vector-related fix - Misc fixes and cleanups - CPU features support - ... and 11 more: https://git.openjdk.org/jdk/compare/98dac46a...a288cbbf ------------- Changes: https://git.openjdk.org/jdk/pull/24462/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=06 Stats: 1319 lines in 50 files changed: 844 ins; 390 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vpaprotski at openjdk.org Wed Apr 16 20:01:05 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Apr 2025 20:01:05 GMT Subject: Integrated: 8354471: Assertion failure with -XX:-EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:46:13 GMT, Volodymyr Paprotski wrote: > The check to choose between AVX2 and AVX512 implementation was relying on `EnableX86ECoreOpts`. It should be relying on `supports_avxifma` and mirror the `UseIntPolyIntrinsics` check in `vm_version_x86.cpp`. > > Note, in `stubGenerator_x86_64.cpp`, entry to the patched function is protected already: > > if (UseIntPolyIntrinsics) { > StubRoutines::_intpoly_montgomeryMult_P256 = generate_intpoly_montgomeryMult_P256(); > StubRoutines::_intpoly_assign = generate_intpoly_assign(); > } This pull request has now been integrated. Changeset: 0c34bf04 Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/0c34bf047615ad57c91cd49844f9d34f9a8329a2 Stats: 18 lines in 1 file changed: 9 ins; 8 del; 1 mod 8354471: Assertion failure with -XX:-EnableX86ECoreOpts Reviewed-by: sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/24644 From kxu at openjdk.org Wed Apr 16 20:02:50 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 16 Apr 2025 20:02:50 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: <505xqF92lH1o1y-nxZ760fTJg0yeJEsxOV5jrbo9ArE=.776257e5-69fd-4c62-87b3-f3f88402c580@github.com> References: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> <505xqF92lH1o1y-nxZ760fTJg0yeJEsxOV5jrbo9ArE=.776257e5-69fd-4c62-87b3-f3f88402c580@github.com> Message-ID: <_YBTE2JIo7CXW0_WY-i1zI-X19Dy77pKHTqxrQAX0Qo=.149fea18-e0c6-49be-9d7f-8ba6498c191b@github.com> On Fri, 11 Apr 2025 08:20:22 GMT, Christian Hagedorn wrote: >> What I had in mind was something like that (I'm also fine with `CountedLoopConverter`: >> >> CountedLoop counted_loop(...); >> counted_loop.convert(); >> return counted_loop.is_valid(); >> >> But maybe you can explain in more detail what the follow-up work will be and how you use this class again later? > > About the placement of this class. You could probably also move it out of the already huge `PhaseIdealLoop` class to make it a non-inner class, if there is not some strong coupling that you absolutely need. > [...] more detail what the follow-up work [...] I was referring to [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) and #22449. Essentially I'm trying to achieve: CountedLoopConverter c(...); if (!c.is_counted_loop() && iv_bt == T_INT and && limit_t == T_LONG) { // transform `i < (long) limit` to `i < (int) (long) limit` if limit < INT_MAX } if (c.is_counted_loop()) { c.convert(); } Which I personally think makes more sense to confirm a counted loops before trying out the conversion. > [...] move it out of the already huge `PhaseIdealLoop` [...] Yes, I agree! I found myself questioning this nested class, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2047658882 From kxu at openjdk.org Wed Apr 16 20:34:44 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 16 Apr 2025 20:34:44 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v2] In-Reply-To: References: <5hNAkBEy0dU1EchyMu-E1YYBrtli0I5ZTgUhkQl7qgo=.145177e1-65ed-4f4b-99a3-cb9e0c03c757@github.com> Message-ID: On Fri, 11 Apr 2025 08:15:28 GMT, Christian Hagedorn wrote: >> That was my initial thought, too; however, `PhaseIdealLoop::insert_loop_limit_check_predicate()`, `::lazy_replace()`, `::set_subtree_ctrl()` and many other mutations make this impossible. > > You might be confusing it with having: > > const PhaseIdealLoop* _phase > > where we cannot call any non-const methods. That's indeed not possible. But what I mean was to make the pointer const: > > PhaseIdealLoop* const _phase; > > such that you cannot do `_phase = xyz` later. You would probably not do that anyway but it's an easy addition and safety for all fields not being reassigned again. It also helps to see which fields are going to be updated as part of the mutable state and which fields are not. Sorry I neglected your diff. Yes `const` on the pointers make sense. I'm marking immutable fields `const`. Thanks for pointing this out! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2047701303 From psandoz at openjdk.org Wed Apr 16 21:02:58 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 16 Apr 2025 21:02:58 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v7] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 19:29:07 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into vector.math.01.java > - Fix debugName handling > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - Reviews and Float64Vector-related fix > - Misc fixes and cleanups > - CPU features support > - ... and 11 more: https://git.openjdk.org/jdk/compare/98dac46a...a288cbbf Very nice work. I am not too familiar with the format of CPU features on various platforms. Is the regex a best guess effort or something known and/or used by HotSpot too? If the former perhaps we might get tripped up by the assert in `CPUFeatures.validateFeatures` potentially rendering the Vector API unusable. Would it be better instead to drop and debug log? ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2773847925 From dlong at openjdk.org Wed Apr 16 21:13:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Apr 2025 21:13:52 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 07:52:09 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > review comment resolutions Yes, I am suggesting doing something like: __ relocate(__ pc() - 4, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterOr); which would be a bigger change to the implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2810802951 From vlivanov at openjdk.org Wed Apr 16 21:29:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Apr 2025 21:29:50 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v7] In-Reply-To: References: Message-ID: <8pTnWDaK3Mu9rKPfB079ci42ZFHIQQwZNRMXExwJC_Y=.6199c6f2-505d-4b32-858a-5c7dbd273163@github.com> On Wed, 16 Apr 2025 19:29:07 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into vector.math.01.java > - Fix debugName handling > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - Reviews and Float64Vector-related fix > - Misc fixes and cleanups > - CPU features support > - ... and 11 more: https://git.openjdk.org/jdk/compare/98dac46a...a288cbbf Both `CPUFeatures.validateFeatures` and the checks it performs are assertions, so it has to be explicitly enabled by `-esa`. The assert validates some assumptions the code has about the format of features string VM produces. And feature names are controlled by JVM (JVM assigns names itself). If there's a mismatch, something is really broken. Why do you think it makes sense to be defensive here and try to recover if the checks fail? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2810842069 From psandoz at openjdk.org Wed Apr 16 21:46:46 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 16 Apr 2025 21:46:46 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v7] In-Reply-To: <8pTnWDaK3Mu9rKPfB079ci42ZFHIQQwZNRMXExwJC_Y=.6199c6f2-505d-4b32-858a-5c7dbd273163@github.com> References: <8pTnWDaK3Mu9rKPfB079ci42ZFHIQQwZNRMXExwJC_Y=.6199c6f2-505d-4b32-858a-5c7dbd273163@github.com> Message-ID: On Wed, 16 Apr 2025 21:27:32 GMT, Vladimir Ivanov wrote: > Both `CPUFeatures.validateFeatures` and the checks it performs are assertions, so it has to be explicitly enabled by `-esa`. The assert validates some assumptions the code has about the format of features string VM produces. And feature names are controlled by JVM (JVM assigns names itself). If there's a mismatch, something is really broken. Why do you think it makes sense to be defensive here and try to recover if the checks fail? I was uncertain how the names were produced and if some untested hardware with additional CPU features could trigger an assert. Since as you say they are controlled by the JVM then indeed things are very broken if there is a mismatch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2810882715 From sparasa at openjdk.org Wed Apr 16 22:55:01 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 16 Apr 2025 22:55:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v4] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactor RRM instructions to avoid explicit demotion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/09d3c3a8..bc69f323 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=02-03 Stats: 132 lines in 2 files changed: 36 ins; 84 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Wed Apr 16 23:28:23 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 16 Apr 2025 23:28:23 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v5] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: RRM refactoring works for exorb and exorw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/bc69f323..1e666ba2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=03-04 Stats: 24 lines in 2 files changed: 3 ins; 16 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From fyang at openjdk.org Thu Apr 17 00:44:40 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 17 Apr 2025 00:44:40 GMT Subject: RFR: 8354815: RISC-V: Change type of bitwise rotation shift to iRegIorL2I In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 09:46:34 GMT, Anjian-Wen wrote: > There is no need to do a type conversion when the shift amount of bitwise rotation is an integer converted from long (ConvL2I). > There reason is that these instruction performs a rotate right/left of source by the amount in the least-significant 5/6 bits > of the shift amount depending on the width of the operation (32-bit/64-bit). For 32-bit operations, the resulting 32-bit > value is sign-extended by copying bit 31 to all of the more-significant bits. This means that we could use iRegIorL2I type for > source for these 32-bit operations as well. > > Jtreg Testing in progress Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24618#pullrequestreview-2774225680 From sparasa at openjdk.org Thu Apr 17 00:53:27 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 17 Apr 2025 00:53:27 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v6] In-Reply-To: References: Message-ID: <5_GkMf0czJzlDdCikEQdTgQZ_e1m9jcfFedrh-_dYOc=.51a34f5c-86c4-4c36-8bf6-a3a0e9177390@github.com> > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: RRM refactorted to use a unified evex_ndd_and_int8 function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/1e666ba2..3b5d9856 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=04-05 Stats: 67 lines in 3 files changed: 8 ins; 34 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From vlivanov at openjdk.org Thu Apr 17 01:07:39 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Apr 2025 01:07:39 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v8] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: fix broken merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/a288cbbf..1ade1ffd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From xgong at openjdk.org Thu Apr 17 01:39:41 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 17 Apr 2025 01:39:41 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 18:26:18 GMT, Vladimir Ivanov wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 240: >> >>> 238: if (isAARCH64() && vspecies.vectorBitSize() > 128) { >>> 239: return false; // FIXME: SVE support only for MAX shapes >>> 240: } >> >> SVE also supports operations for partial vector size, which means the vector size is smaller than MAX size. For example, if the max vector size of a SVE architecture is 512-bit, the FP vector operations with `VectorShape.S_256_BIT` are also supported. They are implemented with the same scalable math functions in SLEEF. >> >> Hence, I think this check and the assertion in line-198 can be removed. Thanks! > > How does it work now? The code in `generate_vector_math_stubs()` in `stubGenerator_aarch64.cpp` only populates `VEC_SIZE_SCALABLE` shapes with SVE versions. Please see the `addr` definition code in https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L1877 . If queried `addr` returns `nullptr` for 256-bit vectors, and the arch supports scalable vector, then the `addr` will be assigned to the scalable ones. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2048052819 From xgong at openjdk.org Thu Apr 17 01:44:45 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 17 Apr 2025 01:44:45 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:58:34 GMT, Xiaohong Gong wrote: > ### Summary: > [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. > > ### Background: > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). > > The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. > > Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: > > 1. `MaxVectorSize = 16, byte_vector_size = 16`: > - Can load 4 indices per vector register > - So can finish 4 bytes per gather-load operation > - Requires 4 times of gather-loads and final merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] > int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 4 gather-load: > idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] > idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] > idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] > idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] > merge: v = [jlkp bhga cfhf becd] > ``` > > 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: > - Can load 8 indices per vector register > - So can finish 8 bytes per gather-load operation > - Requires 2 times of gather-loads and merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] > int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 2 gather-load: > idx_v1 = [2 5 7 5 1 4 2 3] > idx_v2 = [9 11 10 15 1 7 6 0] > gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] > gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] > merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] > ``` > > 3. `MaxVectorSize = 64, byte_vector_size = MaxVectorSize / 4`: > - Can load 16 indices per vector register > - So can ... Hi @jatin-bhateja , could you please help take a look at this PR especially the X86 part? Thanks a lot! Hi @RealFYang , could you please help review the RVV part? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24679#issuecomment-2811506961 From duke at openjdk.org Thu Apr 17 02:08:53 2025 From: duke at openjdk.org (duke) Date: Thu, 17 Apr 2025 02:08:53 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v9] In-Reply-To: <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> <7u0hDh5_0JFOTDO5LxwUZdpebykF-sR6DQ4rcMpCrV0=.d264a91e-4551-42b9-a74a-0ab65cbd2003@github.com> Message-ID: On Mon, 14 Apr 2025 12:57:00 GMT, Anjian-Wen wrote: >> support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg >> >> This patch add new test in >> test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java >> >> Test Passed >> test/hotspot/jtreg/compiler/vectorapi/* >> in platform: >> aarch64 with sve >> aarch64 without sve >> riscv64 qemu with zvbb > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8329887 > - modify code and test format > - fix test bug > - add aarch64 judgement for the test > - fix zvbb mask match rule > - Merge branch 'openjdk:master' into JDK-8329887 > - add vand_not_masked test > - add vand_not_L test > - Merge branch 'openjdk:master' into JDK-8329887 > - RISC-V: C2: Support Zvbb Vector And-Not instruction > > fix match rule for format > - ... and 1 more: https://git.openjdk.org/jdk/compare/9215bce4...66c886c6 @Anjian-Wen Your change (at version 66c886c629a83feb7bf0edcc233f0a1bb630f498) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24129#issuecomment-2811537173 From duke at openjdk.org Thu Apr 17 02:18:49 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 17 Apr 2025 02:18:49 GMT Subject: Integrated: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: On Thu, 20 Mar 2025 12:29:40 GMT, Anjian-Wen wrote: > support Zvbb Vector And-Not vandn.vv (with and without masked) match rule and add new test in jtreg > > This patch add new test in > test/hotspot/jtreg/compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java > > Test Passed > test/hotspot/jtreg/compiler/vectorapi/* > in platform: > aarch64 with sve > aarch64 without sve > riscv64 qemu with zvbb This pull request has now been integrated. Changeset: 07aad68c Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/07aad68c17ba8d95aee914f3bd9705301477acf6 Stats: 139 lines in 3 files changed: 136 ins; 0 del; 3 mod 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/24129 From jbhateja at openjdk.org Thu Apr 17 02:22:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 02:22:42 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <1iR9_nrbk0iFlgy28u4dO4-7OWjEkO__AoZ9zHqtm8I=.ae8b0a68-0f85-472d-a810-e9c8417097d9@github.com> On Wed, 16 Apr 2025 21:10:38 GMT, Dean Long wrote: > Yes, I am suggesting doing something like: > > ``` > __ relocate(__ pc() - 4, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterOr); > ``` > > which would be a bigger change to the implementation. Yes, this is what I mean by address caching in my above comment. we already have an existing interface for it in place; the intent of this bug fix PR is not to improve upon the infrastructure but to align the fix with the current scheme. Do you suggest doing that in a follow up PR ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2811561649 From jbhateja at openjdk.org Thu Apr 17 03:21:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 03:21:08 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24664/files - new: https://git.openjdk.org/jdk/pull/24664/files/ffd92c37..dc2b2b16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=01-02 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From qamai at openjdk.org Thu Apr 17 06:18:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 17 Apr 2025 06:18:46 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 14 Apr 2025 15:35:17 GMT, Roland Westrelin wrote: >> It would be great if we have union memory slices for this. > >> It would be great if we have union memory slices for this. > > Something like that would fix it but it would be trickier to get right that this point fix, I think. Do you see any other use for it? @rwestrel There are other places in C2 when the memory slices are not expressive enough, which leads us to wrapping those accesses in `CPUMemBar` because otherwise the memory edges would be incorrect. This would become more prominent in Valhalla as atomic flat accesses try to write multiple fields at once. https://github.com/openjdk/valhalla/blob/6b476fa7d67bf80591ed86d291b167d3d5ee5c28/src/hotspot/share/opto/inlinetypenode.cpp#L855 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2811880177 From duke at openjdk.org Thu Apr 17 06:23:33 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 17 Apr 2025 06:23:33 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v4] In-Reply-To: References: Message-ID: <3AsN87ob_1V_i0mIHo96_3IZ2p5QY6x_B2P_Ydsl_7A=.13a3ac28-bf3a-4941-be1c-c010dc48a251@github.com> > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Make test random - Add test for predicate cloning on uncounted loops - Add test case for predication before unswitching - Test predicate cloning only before loop predication Thus, we do not see the predicates in the loop selector that cloning actually killed. - Clone loop limit predicates for uncounted loops When unswitching uncounted loops we have to clone loop limit checks because we do not have information on the behavior of the loop index - Do not clone loop limit checks in loop unswitching - Add suggested comment Co-authored-by: Christian Hagedorn - Remove -Xcomp and replace with Warmup(0) - ir-framework: use new before/after loop opts phases - Add IR test for predicate cloning - ... and 3 more: https://git.openjdk.org/jdk/compare/465c8e65...17b32db4 ------------- Changes: https://git.openjdk.org/jdk/pull/24479/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=03 Stats: 223 lines in 5 files changed: 215 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From duke at openjdk.org Thu Apr 17 06:23:34 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 17 Apr 2025 06:23:34 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 14:25:21 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. This also required some relating changes: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. Changes since v3: - Changed issue and PR title - Only clone Loop Limit Checks if an uncounted loop is unswitched - Add IR test for unswitching an uncounted loop - Add IR test where a counted loop is predicated before it is unswitched - Randomize the tests - Updated branch to master I also reran testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24479#issuecomment-2810194364 From chagedorn at openjdk.org Thu Apr 17 06:23:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Apr 2025 06:23:35 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v2] In-Reply-To: <2Ruy8sL0O4Pw73q06ciylmnZt510nGejeD2zK2Kn66k=.dda8b16b-6622-4c7a-8cdf-001763e96b6a@github.com> References: <2Ruy8sL0O4Pw73q06ciylmnZt510nGejeD2zK2Kn66k=.dda8b16b-6622-4c7a-8cdf-001763e96b6a@github.com> Message-ID: On Thu, 10 Apr 2025 14:54:43 GMT, Manuel H?ssig wrote: >> test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 64: >> >>> 62: @Test >>> 63: // Check that loop unswitching the number of parse predicates inside the unswitched >>> 64: // loop have doubled. >> >> Suggestion: >> >> // Check that Loop Unswitching doubled the number of Parse Predicates: We have them at the true- and false-path-loop. Note that the Loop Limit Check Parse Predicate is not cloned when we already have a counted loop. >> >> >> While writing this suggestion, I think it would also be good to have the following tests: >> - A test where we unswitch a `LoopNode` (i.e. non-counted) to check that the Loop Limit Check Parse Predicate is also cloned to both unswitched loop versions. >> - Since you already added some verification for Assertion Predicates as well, I suggest to go a step further. We could add a test where we first apply Loop Predication and then Loop Unswitching to check if the number of Template Assertion Predicates also doubled. Here we need to be careful that we don't miscount. We only mark the old Template Assertion Predicates useless in Loop Unswitching and then clean them up in IGVN. So, we would need to check before loop opts phase `n`: `x` Template Assertion Predicates and then before loop opts phase `n + 1`: `2*x` Template Assertion Predicates. >> >> You could then update the issue/PR title to: >> C2: Add IR tests to check that predicate cloning in Loop Unswitching works as expected. > > I committed your suggestion in 0a4d89ff57c. > > However, in this test the unswitched is a counted loop and yet the loop limit checks are cloned. I think this test has already found a bug? Good catch! That's indeed a bug and must have been broken during my Assertion Predicates refactorings. Writing the test already paid off :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2037660275 From thartmann at openjdk.org Thu Apr 17 07:07:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Apr 2025 07:07:56 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v3] In-Reply-To: References: Message-ID: <9H3_GDueq6C_pyHZlK6lu-OZXpkSCJCZ7bOd8Ls7hP0=.f6ad13cf-d9df-4edd-a5a3-4e3681ceded8@github.com> On Wed, 16 Apr 2025 13:01:12 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24663#pullrequestreview-2774791229 From duke at openjdk.org Thu Apr 17 07:12:53 2025 From: duke at openjdk.org (duke) Date: Thu, 17 Apr 2025 07:12:53 GMT Subject: RFR: 8344251: C2: remove blackholes with dead control input [v3] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 13:01:12 GMT, Marc Chevalier wrote: >> When a BlackholeNode's control input becomes dead, the node is not removed causing the crash >> >> assert(!in->is_CFG()) failed: CFG Node with no controlling input? >> >> In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: >> >> >> >> >> I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic @marc-chevalier Your change (at version 9f0009901f7b021f72d9f45676ef1df4a7d09ab3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24663#issuecomment-2812006356 From mchevalier at openjdk.org Thu Apr 17 07:14:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 17 Apr 2025 07:14:42 GMT Subject: RFR: 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:00:59 GMT, Marc Chevalier wrote: > Remove the second (unused) parameter of `Compile::igv_print_graph_to_network`. It also makes it more consistent with `Compile::igv_print_method_to_file` that always print from root. Thanks for the swift reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24675#issuecomment-2812006162 From duke at openjdk.org Thu Apr 17 07:14:42 2025 From: duke at openjdk.org (duke) Date: Thu, 17 Apr 2025 07:14:42 GMT Subject: RFR: 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:00:59 GMT, Marc Chevalier wrote: > Remove the second (unused) parameter of `Compile::igv_print_graph_to_network`. It also makes it more consistent with `Compile::igv_print_method_to_file` that always print from root. @marc-chevalier Your change (at version fae036b7cc19fc2d0b31fb0d74c596970789f606) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24675#issuecomment-2812008586 From mchevalier at openjdk.org Thu Apr 17 07:24:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 17 Apr 2025 07:24:52 GMT Subject: Integrated: 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:00:59 GMT, Marc Chevalier wrote: > Remove the second (unused) parameter of `Compile::igv_print_graph_to_network`. It also makes it more consistent with `Compile::igv_print_method_to_file` that always print from root. This pull request has now been integrated. Changeset: fabf67c3 Author: Marc Chevalier Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/fabf67c376708a3be80d2a4e67d30d226d6e6af8 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod 8354625: Compile::igv_print_graph_to_network doesn't use its second parameter Reviewed-by: rcastanedalo, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24675 From mchevalier at openjdk.org Thu Apr 17 07:26:54 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 17 Apr 2025 07:26:54 GMT Subject: Integrated: 8344251: C2: remove blackholes with dead control input In-Reply-To: References: Message-ID: <4_4YEr9nQMHS5GRXQiBXLQzf7Y0vlQufehgtbBnkqTQ=.04a6a6eb-d277-42ae-a7cc-d618e739acca@github.com> On Tue, 15 Apr 2025 13:39:52 GMT, Marc Chevalier wrote: > When a BlackholeNode's control input becomes dead, the node is not removed causing the crash > > assert(!in->is_CFG()) failed: CFG Node with no controlling input? > > In the case reported in the issue, after a round of peeling, a condition becomes constant, and the branch containing the blackhole becomes dead: > > > > > I simply use `Node::remove_dead_region(PhaseGVN*, bool)` to remove the blackhole, as many other node types do. This pull request has now been integrated. Changeset: 1138a186 Author: Marc Chevalier Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/1138a186eb670e2c0662bda69c35680b41f4d66c Stats: 87 lines in 4 files changed: 87 ins; 0 del; 0 mod 8344251: C2: remove blackholes with dead control input Reviewed-by: shade, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24663 From epeter at openjdk.org Thu Apr 17 09:03:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:03:55 GMT Subject: RFR: 8354477: C2 SuperWord: make use of memory edges more explicit In-Reply-To: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> References: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> Message-ID: On Sun, 13 Apr 2025 13:40:13 GMT, Emanuel Peter wrote: > This is a small refactoring, to make it more explicit that the "additional dependencies" are memory dependencies. > > This is also preparatory work for [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751), where we have two memory edge categories: strong and weak. > > I was able to pass around `vtn_dependencies` in fewer occasions, and also renamed it to `vtn_memory_dependencies`. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 102: > 100: for (uint k = 0; k < pack->size(); k++) { > 101: add_dependencies_of_node_to_vtnode(pack->at(k), vtn, vtn_dependencies); > 102: } Now only applies for load / store. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 129: > 127: } > 128: > 129: add_dependencies_of_node_to_vtnode(n, vtn, vtn_dependencies); Now only applies for load / store. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 181: > 179: VTransformNode* req = get_vtnode_or_wrap_as_input_scalar(n->in(index)); > 180: vtn->set_req(index, req); > 181: vtn_dependencies.set(req->_idx); Not useful any more. We used to track all dependencies this way, but we really only need to track the memory dependencies. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 308: > 306: > 307: // Only add memory dependencies to memory nodes. All others are taken care of with the req. > 308: if (n->is_Mem() && !pred->is_Mem()) { continue; } Was exclusion for non-memory edges. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 313: > 311: > 312: // Reduction self-cycle? > 313: if (vtn == dependency && _vloop_analyzer.reductions().is_marked_reduction(n)) { continue; } Was exclusion for non-memory edges. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24613#discussion_r2041131560 PR Review Comment: https://git.openjdk.org/jdk/pull/24613#discussion_r2041131610 PR Review Comment: https://git.openjdk.org/jdk/pull/24613#discussion_r2041131746 PR Review Comment: https://git.openjdk.org/jdk/pull/24613#discussion_r2041131820 PR Review Comment: https://git.openjdk.org/jdk/pull/24613#discussion_r2041131857 From epeter at openjdk.org Thu Apr 17 09:03:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:03:55 GMT Subject: RFR: 8354477: C2 SuperWord: make use of memory edges more explicit Message-ID: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> This is a small refactoring, to make it more explicit that the "additional dependencies" are memory dependencies. This is also preparatory work for [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751), where we have two memory edge categories: strong and weak. I was able to pass around `vtn_dependencies` in fewer occasions, and also renamed it to `vtn_memory_dependencies`. ------------- Commit messages: - simplify and rename - JDK-8354477 Changes: https://git.openjdk.org/jdk/pull/24613/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24613&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354477 Stats: 75 lines in 5 files changed: 20 ins; 13 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/24613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24613/head:pull/24613 PR: https://git.openjdk.org/jdk/pull/24613 From epeter at openjdk.org Thu Apr 17 09:08:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:08:48 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v6] In-Reply-To: <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> Message-ID: <3HnjPR-9RASdzqdnOc9jbdu7s3JOGBhV_jSnIADaenk=.2eea0117-8468-4db7-8d0e-7b80ff19d4f1@github.com> On Mon, 14 Apr 2025 13:50:17 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Comment refinement The code looks good to me. Let me run some more testing. Please ping me again in 24h+ and I can approve after the Easter weekend :) ------------- PR Review: https://git.openjdk.org/jdk/pull/21604#pullrequestreview-2775089008 From epeter at openjdk.org Thu Apr 17 09:15:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:15:50 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:55:15 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > make function static Nice, the code looks even better now :) test/hotspot/jtreg/compiler/c2/irTests/ReverseBytesConstantsTests.java line 23: > 21: * questions. > 22: */ > 23: package compiler.c2.irTests; Can you please move the test to a more specific directory? The `irTests` directory was a bit of a mistake. I think this would make more sense in `test/hotspot/jtreg/compiler/c2/gvn/`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24382#pullrequestreview-2775104402 PR Review Comment: https://git.openjdk.org/jdk/pull/24382#discussion_r2048560285 From epeter at openjdk.org Thu Apr 17 09:22:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:22:55 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v5] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 15:37:36 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - review > - Merge branch 'master' into JDK-8349139 > - merge > - Merge branch 'master' into JDK-8349139 > - other test + review comment > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - fix & test Looks good, except for the whitespace issue :) test/hotspot/jtreg/compiler/controldependency/TestDivDependentOnMainLoopGuard.java line 34: > 32: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM TestDivDependentOnMainLoopGuard > 33: * @run main/othervm -Xcomp -XX:CompileOnly=TestDivDependentOnMainLoopGuard::* TestDivDependentOnMainLoopGuard > 34: * Suggestion: You got some whitespace issues here, we could just remove the line :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23617#pullrequestreview-2775123522 PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2048572661 From epeter at openjdk.org Thu Apr 17 09:25:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:25:53 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 19:28:54 GMT, Vladimir Ivanov wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> make function static > > Looks good. @iwanowww I see you did some internal testing, but not for what version. Should we re-run testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2812295136 From epeter at openjdk.org Thu Apr 17 09:44:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 09:44:04 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 02:24:03 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'origin/master' into dev/merge_loads > - Remove unused code > - Move code to addnode.cpp and add more tests > - Merge remote-tracking branch 'origin/master' into dev/merge_loads > - Fix test > - Add more tests > - Enable StressIGVN and riscv platform > - Change tests as review comments > - Fix test failure and change for review comments > - Revert extract value and add more tests > - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 First batch, need to change trains, I'll continue later :) src/hotspot/share/opto/addnode.cpp line 800: > 798: * > 799: * Because array value is logical and with constant 0xff/0xffff, LoadS/LoadB is converted to an unsigned load > 800: * and 'And' node is eliminated in previous IGVN phase. Please check AndINode::Ideal for reference What do you mean by `array value is logical`? Do you have tests where we do not have the `And` in the code, and there would be implicit sign extension? src/hotspot/share/opto/addnode.cpp line 828: > 826: }; > 827: > 828: typedef GrowableArray MergeLoadInfoList; Do you need to have this Resource allocated? Why not just have the elements in-place? And then make the `MergeLoadInfo` a `StackObj`. That would remove the extra pointer indirection. src/hotspot/share/opto/addnode.cpp line 843: > 841: Node* const _combine; > 842: MemoryAdjacentStatus _order; > 843: bool _require_reverse_bytes; // Do we need add a ReverseBytes for merged load Suggestion: bool _require_reverse_bytes; // Do we need to add a ReverseBytes for merged load src/hotspot/share/opto/addnode.cpp line 857: > 855: > 856: private: > 857: // Detect the embedding combine node is a candidate for merging loads Suggestion: // Detect if the embedding combine node is a candidate for merging loads ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2775153032 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048591407 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048598807 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048600811 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048602890 From epeter at openjdk.org Thu Apr 17 10:16:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 10:16:58 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 02:24:03 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'origin/master' into dev/merge_loads > - Remove unused code > - Move code to addnode.cpp and add more tests > - Merge remote-tracking branch 'origin/master' into dev/merge_loads > - Fix test > - Add more tests > - Enable StressIGVN and riscv platform > - Change tests as review comments > - Fix test failure and change for review comments > - Revert extract value and add more tests > - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 Even more coming later... src/hotspot/share/opto/addnode.cpp line 816: > 814: LoadNode* const _load; > 815: Node* const _combine; > 816: int const _shift; Suggestion: jint const _shift; I prefer using `jint` etc when we are talking about values that correlate to java types. With the C/C++ numerical types, there could always be issues on different platforms with different bit sizes. src/hotspot/share/opto/addnode.cpp line 823: > 821: void dump() { > 822: tty->print_cr("MergeLoadInfo: load: %d, combine: %d, shift: %d", > 823: _load->_idx, _combine->_idx, _shift); It would be nice if you also printed the Node name. e.g. `_load->Name()`. For me that is often more helpful. src/hotspot/share/opto/addnode.cpp line 912: > 910: return false; > 911: } > 912: return true; This feels inverted. I would do a switch case here anyway, and return `false` in the default case. Hmm. LoadS is supposed to be implicitly sign extended. Can those upper sign bits not lead to issues when OR-ing without a mask? src/hotspot/share/opto/addnode.cpp line 916: > 914: > 915: bool MergePrimitiveLoads::is_supported_combine_opcode(int opc) { > 916: return opc == Op_OrI || opc == Op_OrL; Ah, and here you do it "positively". I would also recommend using a switch case, it is nicer to extend later for AND and XOR. src/hotspot/share/opto/addnode.cpp line 920: > 918: > 919: // Go through ConvI2L which is unique output of input node > 920: const Node* MergePrimitiveLoads::by_pass_i2l(const Node* n) { Suggestion: const Node* MergePrimitiveLoads::bypass_ConvI2L(const Node* n) { I think `bypass` is a single verb, not two wordy ;) src/hotspot/share/opto/addnode.cpp line 933: > 931: * > 932: * Load -> OrI/OrL > 933: * It has no LShift usage and the And node is optimized out in previous optimization This is a bit confusing... I thought what you wrote further up was much better, where you also had the ascii art. This here feels like a duplication of that up there. If there is anything here that is missing up there, then I would suggest that you just move it up there. src/hotspot/share/opto/addnode.cpp line 946: > 944: * | ((UNSAFE.getByte(array, address + 3) & 0xff) << 24); > 945: */ > 946: bool MergePrimitiveLoads::is_merged_load_candidate() const { What is your definition of candidate here? It seems to have something to do about not having a `LShift`, why? Maybe you can give a good definition here or somewhere else? src/hotspot/share/opto/addnode.cpp line 950: > 948: const Node* check = by_pass_i2l(_combine); > 949: if (check->outcnt() == 1 && check->unique_out()->Opcode() == _combine->Opcode()) { > 950: // It's in the middle of combine operators Ah, I see. This is about checking that there is nothing further down to merge with.... But are you sure this is a good idea to check that there is no `OR` below? I mean there could be valid uses of `OrI` below that do some OR-ing with something else... Now you are forbidding these cases. Look at this: int x = (... merge load pattern with OR ...); int y = ... some other value ... int z = x | y; I would expect that we could merge the loads here too, but your pattern matching seems to forbid it, right? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2775212830 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048628508 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048631063 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048634906 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048635926 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048656876 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048641180 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048647672 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048652111 From epeter at openjdk.org Thu Apr 17 10:16:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Apr 2025 10:16:58 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 10:10:20 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Remove unused code >> - Move code to addnode.cpp and add more tests >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 > > src/hotspot/share/opto/addnode.cpp line 950: > >> 948: const Node* check = by_pass_i2l(_combine); >> 949: if (check->outcnt() == 1 && check->unique_out()->Opcode() == _combine->Opcode()) { >> 950: // It's in the middle of combine operators > > Ah, I see. This is about checking that there is nothing further down to merge with.... > > But are you sure this is a good idea to check that there is no `OR` below? I mean there could be valid uses of `OrI` below that do some OR-ing with something else... Now you are forbidding these cases. > > Look at this: > > int x = (... merge load pattern with OR ...); > int y = ... some other value ... > int z = x | y; > > > I would expect that we could merge the loads here too, but your pattern matching seems to forbid it, right? Also: I would change the name of the method if we keep it. It is really about checking down, i.e. that there is not other candidate below. Maybe `has_no_merge_load_combine_below`? Ah, another example: We could have two merged loads that we OR: int x = (... merge load pattern with OR ...); int y = (... merge load pattern with OR ...); int z = x | y; It would be nice if this could be optimized too, and we should have an IR test for it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048654975 From amitkumar at openjdk.org Thu Apr 17 10:28:39 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 17 Apr 2025 10:28:39 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... Amit Kumar has updated the pull request incrementally with six additional commits since the last revision: - extra line - improvement - fix testcases - extra line - wip: fixed the regression - [wip] initial mvc template solution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24480/files - new: https://git.openjdk.org/jdk/pull/24480/files/1b8ea8bb..cf709eec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=01-02 Stats: 41 lines in 1 file changed: 36 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480 PR: https://git.openjdk.org/jdk/pull/24480 From amitkumar at openjdk.org Thu Apr 17 10:28:40 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 17 Apr 2025 10:28:40 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: <1Nkh4qHU642eSCJsPnBNCkayH6dV18E7vrbIJqws6dE=.3268efed-2897-4fc9-ae62-1b86cad99b3a@github.com> On Wed, 9 Apr 2025 08:57:40 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: > > - reviews for Martin > - Revert "minor improvement" > > This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. > - minor improvement > - reviews from Lutz and Martin This result is from shared-machine, but looks like the regression part is fixed. We got regression because, for Unaligned case, only 1-byte store instruction were getting emitted (i.e. `stc`). And as the alignment depends on two factors (`size` and `address where we are storing the value`). So we can't always exactly tell that this will be an aligned or un-aligned case in the Benchmark. I will do further testing and will see if more optimization can be done. Then will mark this PR ready for review. Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 2.893 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 3.122 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 3.286 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 3.401 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 3.291 ? 0.021 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 3.455 ? 0.015 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 3.471 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 3.215 ? 0.033 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 4.632 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 3.815 ? 0.014 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 9.695 ? 0.036 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 5.296 ? 0.008 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 9.682 ? 0.011 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 9.508 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 2.887 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 3.134 ? 0.024 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 3.285 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 3.397 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 3.297 ? 0.049 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 3.445 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 3.471 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 3.204 ? 0.023 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 4.630 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 3.811 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 9.676 ? 0.012 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 9.690 ? 0.031 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 9.678 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 4.180 ? 0.010 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 2.636 ? 0.060 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 2.379 ? 0.006 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 7.743 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 2.531 ? 0.113 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 7.746 ? 0.012 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 3.183 ? 0.006 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 7.742 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 2.580 ? 0.095 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 7.870 ? 0.184 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 2.523 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 7.757 ? 0.033 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 3.580 ? 0.005 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 7.744 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 8.090 ? 0.110 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 2.683 ? 0.025 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 7.747 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 7.738 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 7.745 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 7.773 ? 0.064 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 7.736 ? 0.008 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 7.747 ? 0.010 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 7.748 ? 0.030 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 7.735 ? 0.008 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 7.747 ? 0.020 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 7.746 ? 0.013 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 7.743 ? 0.012 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 7.741 ? 0.011 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 2.739 ? 0.005 ns/op Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe' Stub Code Generated with current code: StubRoutines::unsafe_setmemory [0x000003ff9c4b63c0, 0x000003ff9c4b64dc] (284 bytes) -------------------------------------------------------------------------------- BFD: unknown S/390 disassembler option: s390 .long 0x00000000 0x000003ff9c4b63c0: ogrk %r1,%r2,%r3 0x000003ff9c4b63c4: nill %r1,7 0x000003ff9c4b63c8: je 0x000003ff9c4b641e 0x000003ff9c4b63cc: nill %r1,3 0x000003ff9c4b63d0: je 0x000003ff9c4b6464 0x000003ff9c4b63d4: nill %r1,1 0x000003ff9c4b63d8: jne 0x000003ff9c4b649e 0x000003ff9c4b63dc: risbg %r4,%r4,48,55,8 0x000003ff9c4b63e2: risbgz %r1,%r3,32,63,62 0x000003ff9c4b63e8: je 0x000003ff9c4b6410 0x000003ff9c4b63ec: nopr 0x000003ff9c4b63ee: nopr 0x000003ff9c4b63f0: nopr 0x000003ff9c4b63f2: nopr 0x000003ff9c4b63f4: nopr 0x000003ff9c4b63f6: nopr 0x000003ff9c4b63f8: nopr 0x000003ff9c4b63fa: nopr 0x000003ff9c4b63fc: nopr 0x000003ff9c4b63fe: nopr 0x000003ff9c4b6400: sth %r4,0(%r2) 0x000003ff9c4b6404: sth %r4,2(%r2) 0x000003ff9c4b6408: aghi %r2,4 0x000003ff9c4b640c: brct %r1,0x000003ff9c4b6400 0x000003ff9c4b6410: nilf %r3,2 0x000003ff9c4b6416: ber %r14 0x000003ff9c4b6418: sth %r4,0(%r2) 0x000003ff9c4b641c: br %r14 0x000003ff9c4b641e: risbg %r4,%r4,48,55,8 0x000003ff9c4b6424: risbg %r4,%r4,32,47,16 0x000003ff9c4b642a: risbg %r4,%r4,0,31,32 0x000003ff9c4b6430: risbgz %r1,%r3,32,63,60 0x000003ff9c4b6436: je 0x000003ff9c4b6454 0x000003ff9c4b643a: nopr 0x000003ff9c4b643c: nopr 0x000003ff9c4b643e: nopr 0x000003ff9c4b6440: stg %r4,0(%r2) 0x000003ff9c4b6446: stg %r4,8(%r2) 0x000003ff9c4b644c: aghi %r2,16 0x000003ff9c4b6450: brct %r1,0x000003ff9c4b6440 0x000003ff9c4b6454: nilf %r3,8 0x000003ff9c4b645a: ber %r14 0x000003ff9c4b645c: stg %r4,0(%r2) 0x000003ff9c4b6462: br %r14 0x000003ff9c4b6464: risbg %r4,%r4,48,55,8 0x000003ff9c4b646a: risbg %r4,%r4,32,47,16 0x000003ff9c4b6470: risbgz %r1,%r3,32,63,61 0x000003ff9c4b6476: je 0x000003ff9c4b6490 0x000003ff9c4b647a: nopr 0x000003ff9c4b647c: nopr 0x000003ff9c4b647e: nopr 0x000003ff9c4b6480: st %r4,0(%r2) 0x000003ff9c4b6484: st %r4,4(%r2) 0x000003ff9c4b6488: aghi %r2,8 0x000003ff9c4b648c: brct %r1,0x000003ff9c4b6480 0x000003ff9c4b6490: nilf %r3,4 0x000003ff9c4b6496: ber %r14 0x000003ff9c4b6498: st %r4,0(%r2) 0x000003ff9c4b649c: br %r14 0x000003ff9c4b649e: cghi %r3,256 0x000003ff9c4b64a2: jl 0x000003ff9c4b64c4 0x000003ff9c4b64a6: stc %r4,0(%r2) 0x000003ff9c4b64aa: mvc 1(255,%r2),0(%r2) 0x000003ff9c4b64b0: aghi %r2,256 0x000003ff9c4b64b4: aghi %r3,-256 0x000003ff9c4b64b8: cghi %r3,256 0x000003ff9c4b64bc: jh 0x000003ff9c4b64a6 0x000003ff9c4b64c0: ltr %r3,%r3 0x000003ff9c4b64c2: ber %r14 0x000003ff9c4b64c4: stc %r4,0(%r2) 0x000003ff9c4b64c8: aghi %r3,-2 0x000003ff9c4b64cc: blr %r14 0x000003ff9c4b64ce: exrl %r3,0x000003ff9c4b64d6 0x000003ff9c4b64d4: br %r14 0x000003ff9c4b64d6: mvc 1(1,%r2),0(%r2) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2809303487 PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2812434376 From jbhateja at openjdk.org Thu Apr 17 11:58:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 11:58:46 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v6] In-Reply-To: <5_GkMf0czJzlDdCikEQdTgQZ_e1m9jcfFedrh-_dYOc=.51a34f5c-86c4-4c36-8bf6-a3a0e9177390@github.com> References: <5_GkMf0czJzlDdCikEQdTgQZ_e1m9jcfFedrh-_dYOc=.51a34f5c-86c4-4c36-8bf6-a3a0e9177390@github.com> Message-ID: On Thu, 17 Apr 2025 00:53:27 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > RRM refactorted to use a unified evex_ndd_and_int8 function Hi Vamsi, We should extend following peephole optimization rules for LEA detection to cover new NDD patterns. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L14590 https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L14599 https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L14608 I have created a follow up JBS for this https://bugs.openjdk.org/browse/JDK-8354939 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2812635082 From bulasevich at openjdk.org Thu Apr 17 12:30:27 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 17 Apr 2025 12:30:27 GMT Subject: RFR: 8332368: ubsan aarch64: immediate_aarch64.cpp:298:31: runtime error: shift exponent 32 is too large for 32-bit type 'int' Message-ID: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> Running a linux-aarch64-server-fastdebug build with UBSAN (--enable-ubsan configure option) gives the following runtime error immediately on JVM start: ad_aarch64.hpp:7114:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' # Pipeline_Use_Cycle_Mask::operator<<=(int) make/hotspot/ad_aarch64.hpp:7114 # Pipeline_Use_Element::step(unsigned int) make/hotspot/ad_aarch64.hpp:7168 # Pipeline_Use::step(unsigned int) make/hotspot/ad_aarch64.hpp:7216 # Scheduling::step(unsigned int) src/hotspot/share/opto/output.cpp:2116 # Scheduling::AddNodeToBundle(Node*, Block const*) src/hotspot/share/opto/output.cpp:2553 The value of 100 comes from fixed_latency, defined in AD files: // Pipeline class for call. pipe_class pipe_class_call() %{ single_instruction; fixed_latency(100); %} The fixed_latency value is used by the scheduler to model the occupancy of functional units over time. The occupancy is tracked using a uint mask value: fprintf(fp_hpp, "class Pipeline_Use_Element {\n"); fprintf(fp_hpp, "protected:\n"); fprintf(fp_hpp, " // Mask of used functional units\n"); fprintf(fp_hpp, " uint _used;\n\n"); When the scheduler virtually steps over the instruction, it shifts the masks left by the instruction's latency. The problem is that 100 is greater than sizeof(uint), and left-shifting by 100 effectively zeroes the _mask, but, according to the C++ standard, this is undefined behavior. We can find a number of fixed_latency(100) expressions in aarch64.ad, arm.ad, ppc.ad, riscv.ad, x86_64.ad files. Perhaps all of them deserve correction. I suggest leaving the AD files as they are, but limiting the shift value in case it exceeds the allowed maximum in a generated code: void step(uint cycles) { _used = 0; - _mask <<= cycles; + uint max_shift = 8 * sizeof(_mask) - 1; + _mask <<= (cycles < max_shift) ? cycles : max_shift; } In fact, this change does not affect the current behavior; we just eliminate the undefined behavior while preserving the intended semantics. ------------- Commit messages: - 8332368: ubsan aarch64: immediate_aarch64.cpp:298:31: runtime error: shift exponent 32 is too large for 32-bit type 'int' Changes: https://git.openjdk.org/jdk/pull/24696/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24696&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332368 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24696/head:pull/24696 PR: https://git.openjdk.org/jdk/pull/24696 From fjiang at openjdk.org Thu Apr 17 12:34:53 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 17 Apr 2025 12:34:53 GMT Subject: RFR: 8354815: RISC-V: Change type of bitwise rotation shift to iRegIorL2I In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 09:46:34 GMT, Anjian-Wen wrote: > There is no need to do a type conversion when the shift amount of bitwise rotation is an integer converted from long (ConvL2I). > There reason is that these instruction performs a rotate right/left of source by the amount in the least-significant 5/6 bits > of the shift amount depending on the width of the operation (32-bit/64-bit). For 32-bit operations, the resulting 32-bit > value is sign-extended by copying bit 31 to all of the more-significant bits. This means that we could use iRegIorL2I type for > source for these 32-bit operations as well. > > Jtreg Testing in progress Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24618#pullrequestreview-2775572348 From adinn at openjdk.org Thu Apr 17 14:13:51 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 17 Apr 2025 14:13:51 GMT Subject: RFR: 8332368: ubsan aarch64: immediate_aarch64.cpp:298:31: runtime error: shift exponent 32 is too large for 32-bit type 'int' In-Reply-To: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> References: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> Message-ID: On Wed, 16 Apr 2025 18:32:16 GMT, Boris Ulasevich wrote: > Running a linux-aarch64-server-fastdebug build with UBSAN (--enable-ubsan configure option) gives the following runtime error immediately on JVM start: > > > ad_aarch64.hpp:7114:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' > # Pipeline_Use_Cycle_Mask::operator<<=(int) make/hotspot/ad_aarch64.hpp:7114 > # Pipeline_Use_Element::step(unsigned int) make/hotspot/ad_aarch64.hpp:7168 > # Pipeline_Use::step(unsigned int) make/hotspot/ad_aarch64.hpp:7216 > # Scheduling::step(unsigned int) src/hotspot/share/opto/output.cpp:2116 > # Scheduling::AddNodeToBundle(Node*, Block const*) src/hotspot/share/opto/output.cpp:2553 > > > The value of 100 comes from fixed_latency, defined in AD files: > > // Pipeline class for call. > pipe_class pipe_class_call() > %{ > single_instruction; > fixed_latency(100); > %} > > > The fixed_latency value is used by the scheduler to model the occupancy of functional units over time. The occupancy is tracked using a uint mask value: > > fprintf(fp_hpp, "class Pipeline_Use_Element {\n"); > fprintf(fp_hpp, "protected:\n"); > fprintf(fp_hpp, " // Mask of used functional units\n"); > fprintf(fp_hpp, " uint _used;\n\n"); > > When the scheduler virtually steps over the instruction, it shifts the masks left by the instruction's latency. The problem is that 100 is greater than sizeof(uint), and left-shifting by 100 effectively zeroes the _mask, but, according to the C++ standard, this is undefined behavior. > > We can find a number of fixed_latency(100) expressions in aarch64.ad, arm.ad, ppc.ad, riscv.ad, x86_64.ad files. Perhaps all of them deserve correction. I suggest leaving the AD files as they are, but limiting the shift value in case it exceeds the allowed maximum in a generated code: > > void step(uint cycles) { > _used = 0; > - _mask <<= cycles; > + uint max_shift = 8 * sizeof(_mask) - 1; > + _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > > In fact, this change does not affect the current behavior; we just eliminate the undefined behavior while preserving the intended semantics. I believe this is the right solution. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24696#pullrequestreview-2775910433 From rcastanedalo at openjdk.org Thu Apr 17 14:14:30 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Apr 2025 14:14:30 GMT Subject: RFR: 8354520: IGV: dump contextual information Message-ID: This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: - JVM arguments - platform information - JVM version information - date and time - process ID - (compiler) thread ID ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) Additionally, the changeset produce and dumps the C2 stack trace from which each graph is dumped: ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: define igv p igv_print(true, $sp, $fp, $pc) end define igv_node p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) end Thanks to @TobiHartmann for providing useful feedback! #### Testing - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). - Tested interactive usage manually via `gdb` and `rr` on linux-x64. - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. ------------- Commit messages: - Instrument dump_bfs functionality to dump C2 stack traces - Dump also process and thread ids - Remove unused import - Document gdb usage - Remove default arguments for debugger functions - Dump contextual information Changes: https://git.openjdk.org/jdk/pull/24724/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354520 Stats: 214 lines in 7 files changed: 178 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/24724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24724/head:pull/24724 PR: https://git.openjdk.org/jdk/pull/24724 From jbhateja at openjdk.org Thu Apr 17 15:51:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 15:51:45 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> References: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> Message-ID: On Thu, 10 Apr 2025 00:12:07 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios. >> >> | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | >> | :-------------------: | :----------------: | :----------------: | :------------------------: | >> | [-1, 1] | 26.043 | 25.929 | -0.44 | >> | [-2, 2] | 25.330 | 25.260 | -0.28 | >> | [-10, 10] | 24.930 | 24.936 | +0.02 | >> | [-20, 20] | 24.908 | 24.844 | -0.26 | >> | [-100, 100] | 53.813 | 76.650 | +42.44 | >> | [-1000, 1000] | 84.459 | 115.106 | +36.29 | >> | [-10000, 10000] | 93.980 | 123.320 | +31.22 ... > > Mohamed Issa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Add new tanh micro-benchmark that covers different ranges of input values Do you think we should modify the ulp threshold of test/jdk/java/lang/Math/HyperbolicTests.java to 2.5 from existing 3.0 to match the specs. test/micro/org/openjdk/bench/java/lang/MathBench.java line 70: > 68: > 69: @Param("0") > 70: public double tanhBound1; Suggestion: @Param("0", "1", "2", "3") public double tanhRangeIndex; test/micro/org/openjdk/bench/java/lang/MathBench.java line 73: > 71: > 72: @Param("2.7755575615628914E-17") > 73: public double tanhBound2; We can declare tanBoundIndex as a Parameter and then select from [hard-coded value ranges](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/FdLibm.java#L3258), which will allow us to execute all the special ranges and NaN value. double tanhRangeArray [][] = { 0.0 , 0x1.0P-56}, {0x1.0P-56, 1.0}, {1.0, 22.0}, {22.0, Double.POSITIVE_INFINITY}} double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][0]; double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][1]; test/micro/org/openjdk/bench/java/lang/MathBench.java line 549: > 547: for (int i = 0; i < tanhValueCount; i++) { > 548: sum += Math.tanh(tanhPosVector[i]) + Math.tanh(tanhNegVector[i]); > 549: } You can remove the noise from the benchmark by assiging the array element to a double field in Invocation level setup and then directly pass that as an argument to tanh. Refer https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/ArraysSort.java#L109 test/micro/org/openjdk/bench/java/lang/MathBench.java line 551: > 549: } > 550: return sum; > 551: } Please also add benchmark kernels receiving constant inputs i.e. Math.tanh(1.0). Current handling for transidental intrinsics creates a stub call node during parsing, which leaves no room to perform constant folding Value transforms. Creating a macro IR which runs through GVN optimization and lazily expands to CallNode should fix it. We already have a similar JBS https://bugs.openjdk.org/browse/JDK-8350831 but its good to add a benchmark for now. ------------- PR Review: https://git.openjdk.org/jdk/pull/23889#pullrequestreview-2775553709 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2048933110 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2048942023 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2048835497 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2048864706 From jbhateja at openjdk.org Thu Apr 17 15:51:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 15:51:47 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: References: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> Message-ID: On Thu, 17 Apr 2025 13:25:42 GMT, Jatin Bhateja wrote: >> Mohamed Issa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Add new tanh micro-benchmark that covers different ranges of input values > > test/micro/org/openjdk/bench/java/lang/MathBench.java line 73: > >> 71: >> 72: @Param("2.7755575615628914E-17") >> 73: public double tanhBound2; > > We can declare tanBoundIndex as a Parameter and then select from [hard-coded value ranges](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/FdLibm.java#L3258), which will allow us to execute all the special ranges and NaN value. > double tanhRangeArray [][] = { 0.0 , 0x1.0P-56}, {0x1.0P-56, 1.0}, {1.0, 22.0}, {22.0, Double.POSITIVE_INFINITY}} > double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][0]; > double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][1]; I see in a standalone micro that NaN / Inf case performs better without intrinsics NaN:- GNR>java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dtanh -cp . test [time] 174 ms [res] GNR>java -XX:+UnlockDiagnosticVMOptions -cp . test [time] 278 ms [res] Inf:- GNR>java -XX:+UnlockDiagnosticVMOptions -cp . test [time] 410 ms [res] GNR>java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dtanh -cp . test [time] 165 ms [res] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2048996659 From jbhateja at openjdk.org Thu Apr 17 15:51:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 15:51:48 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: References: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> Message-ID: <-fk5qRPorpCSyBajJmRW-eWGZvIEyUnnjocGPPRi4nA=.cf079fd5-ddb7-43c3-afca-e2484b092ef9@github.com> On Thu, 17 Apr 2025 13:48:38 GMT, Jatin Bhateja wrote: >> test/micro/org/openjdk/bench/java/lang/MathBench.java line 73: >> >>> 71: >>> 72: @Param("2.7755575615628914E-17") >>> 73: public double tanhBound2; >> >> We can declare tanBoundIndex as a Parameter and then select from [hard-coded value ranges](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/FdLibm.java#L3258), which will allow us to execute all the special ranges and NaN value. >> double tanhRangeArray [][] = { 0.0 , 0x1.0P-56}, {0x1.0P-56, 1.0}, {1.0, 22.0}, {22.0, Double.POSITIVE_INFINITY}} >> double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][0]; >> double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][1]; > > I see in a standalone micro that NaN / Inf case performs better without intrinsics > > NaN:- > GNR>java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dtanh -cp . test > [time] 174 ms [res] > GNR>java -XX:+UnlockDiagnosticVMOptions -cp . test > [time] 278 ms [res] > > Inf:- > GNR>java -XX:+UnlockDiagnosticVMOptions -cp . test > [time] 410 ms [res] > GNR>java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dtanh -cp . test > [time] 165 ms [res] It looks like after |x| > 22.0 we always check for NaN/Inf value and then take a different control path, if we flip this and check for NaN values upfront we can avoid going through newly added instruction sequence for |X| > 22.0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2049003055 From kvn at openjdk.org Thu Apr 17 15:57:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Apr 2025 15:57:49 GMT Subject: RFR: 8354477: C2 SuperWord: make use of memory edges more explicit In-Reply-To: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> References: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> Message-ID: On Sun, 13 Apr 2025 13:40:13 GMT, Emanuel Peter wrote: > This is a small refactoring, to make it more explicit that the "additional dependencies" are memory dependencies. > > This is also preparatory work for [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751), where we have two memory edge categories: strong and weak. > > I was able to pass around `vtn_dependencies` in fewer occasions, and also renamed it to `vtn_memory_dependencies`. Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24613#pullrequestreview-2776251407 From vlivanov at openjdk.org Thu Apr 17 17:54:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Apr 2025 17:54:33 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v9] In-Reply-To: References: Message-ID: <9P_rdGJAfXNZMa82oVtdWXdPO0krCVCRRUkve_Z-ZpU=.9f38ab8c-ce13-47c0-b8b3-a8fa0f9048c1@github.com> > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: RVV and SVE adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/1ade1ffd..e2b762ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=07-08 Stats: 19 lines in 1 file changed: 2 ins; 11 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Thu Apr 17 18:03:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Apr 2025 18:03:47 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into vector.math.01.java - RVV and SVE adjustments - fix broken merge - Merge branch 'master' into vector.math.01.java - Fix debugName handling - Merge branch 'master' into vector.math.01.java - RVV and SVE adjustments - Merge branch 'master' into vector.math.01.java - Fix windows-aarch64 build failure - features_string -> cpu_info_string - ... and 14 more: https://git.openjdk.org/jdk/compare/bba47808...88eacc48 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/e2b762ec..88eacc48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=08-09 Stats: 5947 lines in 148 files changed: 5156 ins; 446 del; 345 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From sviswanathan at openjdk.org Thu Apr 17 18:03:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 17 Apr 2025 18:03:58 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> References: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> Message-ID: On Thu, 10 Apr 2025 00:12:07 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios. >> >> | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | >> | :-------------------: | :----------------: | :----------------: | :------------------------: | >> | [-1, 1] | 26.043 | 25.929 | -0.44 | >> | [-2, 2] | 25.330 | 25.260 | -0.28 | >> | [-10, 10] | 24.930 | 24.936 | +0.02 | >> | [-20, 20] | 24.908 | 24.844 | -0.26 | >> | [-100, 100] | 53.813 | 76.650 | +42.44 | >> | [-1000, 1000] | 84.459 | 115.106 | +36.29 | >> | [-10000, 10000] | 93.980 | 123.320 | +31.22 ... > > Mohamed Issa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Add new tanh micro-benchmark that covers different ranges of input values Thanks for fixing this. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23889#pullrequestreview-2776551550 From sviswanathan at openjdk.org Thu Apr 17 18:07:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 17 Apr 2025 18:07:53 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: References: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> Message-ID: On Thu, 17 Apr 2025 15:49:01 GMT, Jatin Bhateja wrote: > Do you think we should modify the ulp threshold of test/jdk/java/lang/Math/HyperbolicTests.java to 2.5 from existing 3.0 to match the specs. That part of test is pre-existing and needs to work across architectures, changing that is out of scope for this PR IMHO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2813688528 From vlivanov at openjdk.org Thu Apr 17 18:10:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Apr 2025 18:10:46 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 01:36:39 GMT, Xiaohong Gong wrote: >> How does it work now? The code in `generate_vector_math_stubs()` in `stubGenerator_aarch64.cpp` only populates `VEC_SIZE_SCALABLE` shapes with SVE versions. > > Please see the `addr` definition code in https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L1877 . If queried `addr` returns `nullptr` for 256-bit vectors, and the arch supports scalable vector, then the `addr` will be assigned to the scalable ones. Ah, ok. Thanks for the pointer. I aligned lookup logic with existing behavior. I'd like to double-check one things: is it fine to use scalable vector variants for fixed-sized vector shapes of smaller size without any explicit masking/stripping of upper vector part? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2049436133 From kxu at openjdk.org Thu Apr 17 18:23:20 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 17 Apr 2025 18:23:20 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: WIP: review followups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/b72e7714..4d7738c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=01-02 Stats: 192 lines in 2 files changed: 80 ins; 60 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From dlong at openjdk.org Thu Apr 17 19:40:57 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 17 Apr 2025 19:40:57 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions When I made my suggestions, I didn't realize it would also require changes on the Graal side. So I would suggest a separate PR only if the Graal team agrees. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2813856674 From sparasa at openjdk.org Thu Apr 17 22:01:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 17 Apr 2025 22:01:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v7] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactor APX NDD shift instrcutions to do demotion internally ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/3b5d9856..ca750e3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=05-06 Stats: 68 lines in 2 files changed: 28 ins; 34 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From duke at openjdk.org Thu Apr 17 22:22:27 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 17 Apr 2025 22:22:27 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v8] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix call sites and change relocation to opt in ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/a7f32409..e12e2c18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=06-07 Stats: 205 lines in 8 files changed: 134 ins; 25 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Apr 17 22:27:01 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 17 Apr 2025 22:27:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v9] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Undo AsmRemarks and DbgStrings reuse ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/e12e2c18..8e53ed1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=07-08 Stats: 207 lines in 2 files changed: 99 ins; 108 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Apr 17 22:35:26 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 17 Apr 2025 22:35:26 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v10] In-Reply-To: References: Message-ID: <3tg6B-TOIKnReUlHQSAfh-ymf0nql5EYC7p3oRHB1UE=.09fbbbf5-2368-4c04-8c9e-0a5720dbd637@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/8e53ed1c..41217686 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=08-09 Stats: 23 lines in 2 files changed: 0 ins; 3 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Apr 17 22:41:02 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 17 Apr 2025 22:41:02 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v11] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/41217686..27e41510 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From sparasa at openjdk.org Fri Apr 18 00:19:07 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 18 Apr 2025 00:19:07 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v8] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactor imul instructions to fold demotion logic inside ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/ca750e3e..1fa0fbe4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=06-07 Stats: 79 lines in 2 files changed: 35 ins; 28 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From duke at openjdk.org Fri Apr 18 01:36:10 2025 From: duke at openjdk.org (erifan) Date: Fri, 18 Apr 2025 01:36:10 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 ``` With option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 ``` Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 13076858.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 ``` The small amount of performance degradation is due to test fluctuations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/1f3c5899..1b9c3b36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=00-01 Stats: 10943 lines in 221 files changed: 9732 ins; 611 del; 600 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From xgong at openjdk.org Fri Apr 18 01:48:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 18 Apr 2025 01:48:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v5] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:08:21 GMT, Vladimir Ivanov wrote: >> Please see the `addr` definition code in https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L1877 . If queried `addr` returns `nullptr` for 256-bit vectors, and the arch supports scalable vector, then the `addr` will be assigned to the scalable ones. > > Ah, ok. Thanks for the pointer. I aligned lookup logic with existing behavior. > > I'd like to double-check one things: is it fine to use scalable vector variants for fixed-sized vector shapes of smaller size without any explicit masking/stripping of upper vector part? Yes, it's fine for lanewise oeprations. For others like cross-lanes, stores and other operations, we will generate a predicate to strip the upper vector part. The op list that need the masking are listed here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L280 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2049872110 From xgong at openjdk.org Fri Apr 18 01:51:55 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 18 Apr 2025 01:51:55 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:03:47 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into vector.math.01.java > - RVV and SVE adjustments > - fix broken merge > - Merge branch 'master' into vector.math.01.java > - Fix debugName handling > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - ... and 14 more: https://git.openjdk.org/jdk/compare/c92b982b...88eacc48 Looks good to me. Thanks for your updating! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2777322105 From duke at openjdk.org Fri Apr 18 02:11:45 2025 From: duke at openjdk.org (duke) Date: Fri, 18 Apr 2025 02:11:45 GMT Subject: RFR: 8354815: RISC-V: Change type of bitwise rotation shift to iRegIorL2I In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 09:46:34 GMT, Anjian-Wen wrote: > There is no need to do a type conversion when the shift amount of bitwise rotation is an integer converted from long (ConvL2I). > There reason is that these instruction performs a rotate right/left of source by the amount in the least-significant 5/6 bits > of the shift amount depending on the width of the operation (32-bit/64-bit). For 32-bit operations, the resulting 32-bit > value is sign-extended by copying bit 31 to all of the more-significant bits. This means that we could use iRegIorL2I type for > source for these 32-bit operations as well. > > Jtreg > hs:tier1-hs:tier3 tested on linux-riscv64 platform equipped with Zbb. @Anjian-Wen Your change (at version 7fbdf445485f57e33bf6ec0c008769b66928d2fb) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24618#issuecomment-2814355637 From duke at openjdk.org Fri Apr 18 02:23:46 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 18 Apr 2025 02:23:46 GMT Subject: Integrated: 8354815: RISC-V: Change type of bitwise rotation shift to iRegIorL2I In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 09:46:34 GMT, Anjian-Wen wrote: > There is no need to do a type conversion when the shift amount of bitwise rotation is an integer converted from long (ConvL2I). > There reason is that these instruction performs a rotate right/left of source by the amount in the least-significant 5/6 bits > of the shift amount depending on the width of the operation (32-bit/64-bit). For 32-bit operations, the resulting 32-bit > value is sign-extended by copying bit 31 to all of the more-significant bits. This means that we could use iRegIorL2I type for > source for these 32-bit operations as well. > > Jtreg > hs:tier1-hs:tier3 tested on linux-riscv64 platform equipped with Zbb. This pull request has now been integrated. Changeset: 0995b940 Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/0995b9409d910d816276673b5c06fdf7826bfac7 Stats: 27 lines in 3 files changed: 14 ins; 2 del; 11 mod 8354815: RISC-V: Change type of bitwise rotation shift to iRegIorL2I Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/24618 From duke at openjdk.org Fri Apr 18 08:39:42 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 18 Apr 2025 08:39:42 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:41:16 GMT, Emanuel Peter wrote: > First batch, need to change trains, I'll continue later :) Thanks for your review. I will check them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2814930349 From roland at openjdk.org Fri Apr 18 13:36:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 18 Apr 2025 13:36:59 GMT Subject: RFR: 8354477: C2 SuperWord: make use of memory edges more explicit In-Reply-To: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> References: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> Message-ID: On Sun, 13 Apr 2025 13:40:13 GMT, Emanuel Peter wrote: > This is a small refactoring, to make it more explicit that the "additional dependencies" are memory dependencies. > > This is also preparatory work for [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751), where we have two memory edge categories: strong and weak. > > I was able to pass around `vtn_dependencies` in fewer occasions, and also renamed it to `vtn_memory_dependencies`. Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24613#pullrequestreview-2778595641 From roland at openjdk.org Fri Apr 18 14:08:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 18 Apr 2025 14:08:11 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v6] In-Reply-To: References: Message-ID: > The test crashes because of a division by zero. The `Div` node for > that one is initially part of a counted loop. The control input of the > node is cleared because the divisor is non zero. This is because the > divisor depends on the loop phi and the type of the loop phi is > narrowed down when the counted loop is created. pre/main/post loops > are created, unrolling happens, the main loop looses its backedge. The > `Div` node can then float above the zero trip guard for the main > loop. When the zero trip guard is not taken, there's no guarantee the > divisor is non zero so the `Div` node should be pinned below it. > > I propose we revert the change I made with 8334724 which removed > `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this > method inserted was there to handle exactly this problem. It was added > initially for a similar issue but with array loads. That problem with > loads is handled some other way now and that's why I thought it was > safe to proceed with the removal. > > The code in this patch is somewhat different from the one we had > before for a couple reasons: > > 1- assert predicate code evolved and so previous logic can't be > resurrected as it was. > > 2- the previous logic has a bug. > > Regarding 1-: during pre/main/post loop creation, we used to add the > `CastII` and then to add assertion predicates (so assertion predicates > depended on the `CastII`). Then when unrolling, when assertion > predicates are updated, we would skip over the `CastII`. What I > propose here is to add the `CastII` after assertion predicates are > added. As a result, they don't depend on the `CastII` and there's no > need for any extra logic when unrolling happens. This, however, > doesn't work when the assertion predicates are added by RCE. In that > case, I had to add logic to skip over the `CastII` (similar to what > existed before I removed it). > > Regarding 2-: previous implementation for > `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at > the first loop `Phi` it encounters that's a use of the loop increment: > it's usually the iv but not always. I tweaked the test case to show, > this bug can actually cause a crash and changed the logic for > `PhaseIdealLoop::cast_incr_before_loop()` accordingly. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23617/files - new: https://git.openjdk.org/jdk/pull/23617/files/48956511..19590c1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23617&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23617&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23617/head:pull/23617 PR: https://git.openjdk.org/jdk/pull/23617 From roland at openjdk.org Fri Apr 18 14:08:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 18 Apr 2025 14:08:12 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: References: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> Message-ID: On Fri, 21 Mar 2025 17:40:51 GMT, Quan Anh Mai wrote: >> @merykitty see above my late reply to your comments if you missed it. > > @rwestrel I have thought about this issue for a while and come to the conclusion that we do depend on a loop phi being a pinned node when doing optimizations (e.g `init <= iv < limit`). As a result, it seems logical to insert a pinned cast here so that the `Phi` does not freely float away when the loop disappears. I agree with your patch then. @merykitty @eme64 thanks for the reviews. Can one of you approve the latest change (with the extra whitespaces removed)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2815511857 From roland at openjdk.org Fri Apr 18 14:08:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 18 Apr 2025 14:08:13 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v5] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:19:28 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - review >> - Merge branch 'master' into JDK-8349139 >> - merge >> - Merge branch 'master' into JDK-8349139 >> - other test + review comment >> - Merge branch 'master' into JDK-8349139 >> - Merge branch 'master' into JDK-8349139 >> - Merge branch 'master' into JDK-8349139 >> - fix & test > > test/hotspot/jtreg/compiler/controldependency/TestDivDependentOnMainLoopGuard.java line 34: > >> 32: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM TestDivDependentOnMainLoopGuard >> 33: * @run main/othervm -Xcomp -XX:CompileOnly=TestDivDependentOnMainLoopGuard::* TestDivDependentOnMainLoopGuard >> 34: * > > Suggestion: > > > You got some whitespace issues here, we could just remove the line :) Thanks. I missed that. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2050678323 From dnsimon at openjdk.org Fri Apr 18 15:26:41 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 18 Apr 2025 15:26:41 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 13:27:02 GMT, Doug Simon wrote: > After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. That object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes a lower limit on the name value as this name should only be for informative purposes when inspecting compiled code. > > While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. src/hotspot/share/code/nmethod.cpp line 115: > 113: #define CHECKED_CAST(result, T, thing) \ > 114: result = static_cast(thing); \ > 115: guarantee(static_cast(result) == thing, "failed: %d != %d", static_cast(result), thing); This check is not is a hot path and so afford to be a guarantee. That said, I've no strong objection to leaving it as an assert. src/hotspot/share/jvmci/jvmciCodeInstaller.cpp line 831: > 829: stringStream st; > 830: st.print_cr("(hosted JVMCI compilation: %s)", name); > 831: CompileTask::print(tty, nm, st.as_string()); @JohnTortugo this may be of interest to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24753#discussion_r2050763430 PR Review Comment: https://git.openjdk.org/jdk/pull/24753#discussion_r2050766285 From dnsimon at openjdk.org Fri Apr 18 15:26:40 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 18 Apr 2025 15:26:40 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 Message-ID: After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. That object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes a lower limit on the name value as this name should only be for informative purposes when inspecting compiled code. While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. ------------- Commit messages: - impose a length limit on InstalledCode.name - convert assert to guarantee - include the name if non-null when printing "hosted" JVMCI compilations Changes: https://git.openjdk.org/jdk/pull/24753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355034 Stats: 76 lines in 4 files changed: 73 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24753/head:pull/24753 PR: https://git.openjdk.org/jdk/pull/24753 From dnsimon at openjdk.org Fri Apr 18 15:35:41 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 18 Apr 2025 15:35:41 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v2] In-Reply-To: References: Message-ID: > After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. That object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes a lower limit on the name value as this name should only be for informative purposes when inspecting compiled code. > > While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: impose a length limit on InstalledCode.name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24753/files - new: https://git.openjdk.org/jdk/pull/24753/files/03c10e9c..be003f35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24753/head:pull/24753 PR: https://git.openjdk.org/jdk/pull/24753 From never at openjdk.org Fri Apr 18 16:58:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 18 Apr 2025 16:58:54 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v2] In-Reply-To: References: Message-ID: <1sdKHhrvskbtNgGENIhGFoHPFCThgnyyAdJXMhnr-3s=.285d63a5-e834-4fee-ae8a-ff1af8039be6@github.com> On Fri, 18 Apr 2025 15:35:41 GMT, Doug Simon wrote: >> After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. That object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes a lower limit on the name value as this name should only be for informative purposes when inspecting compiled code. >> >> While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > impose a length limit on InstalledCode.name Looks good to me. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24753#pullrequestreview-2779016276 From jvernee at openjdk.org Fri Apr 18 17:09:51 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 18 Apr 2025 17:09:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:03:47 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into vector.math.01.java > - RVV and SVE adjustments > - fix broken merge > - Merge branch 'master' into vector.math.01.java > - Fix debugName handling > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - ... and 14 more: https://git.openjdk.org/jdk/compare/7b477dbf...88eacc48 Very interesting! Looks mostly good to me. Left a few inline notes. src/hotspot/share/prims/vectorSupport.cpp line 622: > 620: > 621: ThreadToNativeFromVM ttn(thread); > 622: return env->NewStringUTF(features_string); Isn't there a way to do this without the extra transition? How about: oop result = java_lang_String::create_oop_from_str((char*) bytes, CHECK_NULL); return (jstring) JNIHandles::make_local(THREAD, result); src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 379: > 377: V libraryBinaryOp(long addr, Class vClass, Class eClass, int length, String debugName, > 378: V v1, V v2, > 379: BinaryOperation defaultImpl) { I notice that the bound of `V` differs between `libraryUnaryOp`, which uses `Vectory` and this method, which uses `VectorPayload`. Not sure if this is intentional? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 272: > 270: MemorySegment addr = LOOKUP.findOrThrow(symbol); > 271: debug("%s %s => 0x%016x\n", op, symbol, addr.address()); > 272: T impl = implSupplier.apply(opc); // TODO: should call the very same native implementation eventually (once FFM API supports vectors) FWIW, one current barrier I see to implementing the vector calling convention in the linker, is that the FFM linker (currently) transmits register values to the downcall stub use Java primitive types. So, in order to support vector calling conventions, we would need to add some kind of 'primitive' that can hold the entire vector value, and preferably gets passed in the right register. However, I think in the case of these math libraries in particular, speed of the fallback implementation is not that much of an issue, since there is also an intrinsic. So alternatively, we could split a vector value up into smaller integral types (`int`, `long`) -> pass them to the downcall stub in that form -> and then reconstruct the full vector value in its target register. (I used the same trick when I was experimenting with FP80 support, which also requires splitting up the 80 bit value up into 2 `long`s). src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 305: > 303: @ForceInline > 304: /*package-private*/ static > 305: > Here you're using `Vector` instead of `VectorPayload` for the binary op, so there seems to be a discrepancy with `VectorSupport`. ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2778903954 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050830705 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050818654 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050875854 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050869087 From dnsimon at openjdk.org Sat Apr 19 20:16:24 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 19 Apr 2025 20:16:24 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v3] In-Reply-To: References: Message-ID: > After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. That object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes a lower limit on the name value as this name should only be for informative purposes when inspecting compiled code. > > While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: InstalledCode.name can be null ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24753/files - new: https://git.openjdk.org/jdk/pull/24753/files/be003f35..f7c01793 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=01-02 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24753/head:pull/24753 PR: https://git.openjdk.org/jdk/pull/24753 From syan at openjdk.org Sun Apr 20 03:31:40 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 20 Apr 2025 03:31:40 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:58:34 GMT, Xiaohong Gong wrote: > ### Summary: > [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. > > ### Background: > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). > > The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. > > Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: > > 1. `MaxVectorSize = 16, byte_vector_size = 16`: > - Can load 4 indices per vector register > - So can finish 4 bytes per gather-load operation > - Requires 4 times of gather-loads and final merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] > int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 4 gather-load: > idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] > idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] > idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] > idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] > merge: v = [jlkp bhga cfhf becd] > ``` > > 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: > - Can load 8 indices per vector register > - So can finish 8 bytes per gather-load operation > - Requires 2 times of gather-loads and merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] > int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 2 gather-load: > idx_v1 = [2 5 7 5 1 4 2 3] > idx_v2 = [9 11 10 15 1 7 6 0] > gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] > gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] > merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] > ``` > > 3. `MaxVectorSize = 64, byte_vector_size = MaxVectorSize / 4`: > - Can load 16 indices per vector register > - So can ... Changes requested by syan (Committer). test/hotspot/jtreg/compiler/vectorapi/VectorGatherSubwordTest.java line 39: > 37: * @modules jdk.incubator.vector > 38: * > 39: * @run driver compiler.vectorapi.VectorGatherSubwordTest Should we use `@run main` instead of `@run driver` ------------- PR Review: https://git.openjdk.org/jdk/pull/24679#pullrequestreview-2780137136 PR Review Comment: https://git.openjdk.org/jdk/pull/24679#discussion_r2051625676 From yzheng at openjdk.org Sun Apr 20 16:02:41 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Sun, 20 Apr 2025 16:02:41 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v3] In-Reply-To: References: Message-ID: <_Nzvk2Xfs0MNKg_fTex-9PHkl1AgNCFLfngK-8n9gOo=.3cc77031-427a-4efe-8a2a-62848b45ec93@github.com> On Sat, 19 Apr 2025 20:16:24 GMT, Doug Simon wrote: >> After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. This object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes an upper limit on the name value as this name should only be for informative purposes when inspecting compiled code. >> >> While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > InstalledCode.name can be null LGTM! ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/24753#pullrequestreview-2780278854 From jbhateja at openjdk.org Mon Apr 21 06:30:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Apr 2025 06:30:46 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v6] In-Reply-To: <3HnjPR-9RASdzqdnOc9jbdu7s3JOGBhV_jSnIADaenk=.2eea0117-8468-4db7-8d0e-7b80ff19d4f1@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> <3HnjPR-9RASdzqdnOc9jbdu7s3JOGBhV_jSnIADaenk=.2eea0117-8468-4db7-8d0e-7b80ff19d4f1@github.com> Message-ID: On Thu, 17 Apr 2025 09:05:50 GMT, Emanuel Peter wrote: > The code looks good to me. Let me run some more testing. Please ping me again in 24h+ and I can approve after the Easter weekend :) Hi @eme64, pinging for approval if validation is green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21604#issuecomment-2817744378 From jbhateja at openjdk.org Mon Apr 21 06:40:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Apr 2025 06:40:42 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v4] In-Reply-To: References: Message-ID: <1kRZNcIzkhr_xU_x5mamZnGJ_6nFkH-5ylFEyT2H_AQ=.a0f578c2-a58b-4708-9f58-c897a62e2526@github.com> > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8350896 - Review comments resolutions - Review comments resolutions - 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/73e1118e..a4ae0803 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=02-03 Stats: 380151 lines in 4333 files changed: 106351 ins; 250527 del; 23273 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From epeter at openjdk.org Mon Apr 21 10:21:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 10:21:46 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v6] In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 14:08:11 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Looks good, thanks for the work @rwestrel ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23617#pullrequestreview-2780979542 From epeter at openjdk.org Mon Apr 21 10:24:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 10:24:42 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v6] In-Reply-To: <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> Message-ID: On Mon, 14 Apr 2025 13:50:17 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Comment refinement Testing looks good, code too ? You'll need a second review though :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21604#pullrequestreview-2780983729 From epeter at openjdk.org Mon Apr 21 10:31:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 10:31:52 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v4] In-Reply-To: <1kRZNcIzkhr_xU_x5mamZnGJ_6nFkH-5ylFEyT2H_AQ=.a0f578c2-a58b-4708-9f58-c897a62e2526@github.com> References: <1kRZNcIzkhr_xU_x5mamZnGJ_6nFkH-5ylFEyT2H_AQ=.a0f578c2-a58b-4708-9f58-c897a62e2526@github.com> Message-ID: <4ln_6m9TGKUbvd0mt8jryqkEEllMM6xwVg9BzTPBOdk=.42604688-6ff9-4e0d-97f6-d4413f94c710@github.com> On Mon, 21 Apr 2025 06:40:42 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8350896 > - Review comments resolutions > - Review comments resolutions > - 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 29: > 27: * @library /test/lib / > 28: * @summary C2: wrong result: Integer/Long.compress gets wrong type from CompressBitsNode::Value. > 29: * @run main/othervm compiler.c2.TestBitCompressValueTransform Now that you have no flags any more, I think we can use the `driver`. Suggestion: * @run driver compiler.c2.TestBitCompressValueTransform ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052236081 From epeter at openjdk.org Mon Apr 21 10:34:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 10:34:49 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 06:56:39 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. >> >> Ah, I just noticed the test directory. I think we can put it in a more specific location. > > There are operation specific tests under compiler/c2 let keep it this way. But this is about a `Value` optimization, which generally belong under the `gvn` directory. Just because we used to put all tests in one directory does not mean we should continue that practice ;) We should probably at some point clean up all the tests, and move them to better directories. Putting them already to better directories will reduce the backport issues later on, so please put it somewhere more specific. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052239584 From epeter at openjdk.org Mon Apr 21 11:02:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 11:02:04 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v4] In-Reply-To: <1kRZNcIzkhr_xU_x5mamZnGJ_6nFkH-5ylFEyT2H_AQ=.a0f578c2-a58b-4708-9f58-c897a62e2526@github.com> References: <1kRZNcIzkhr_xU_x5mamZnGJ_6nFkH-5ylFEyT2H_AQ=.a0f578c2-a58b-4708-9f58-c897a62e2526@github.com> Message-ID: On Mon, 21 Apr 2025 06:40:42 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8350896 > - Review comments resolutions > - Review comments resolutions > - 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value Here are my responses :) I'm still worried about updating `hi` and `lo` without some kind of assert or proof that it does not widen accidentally. I think we can prove with some effort that they do not widen currently. But then someone else comes along and modifies the code and then they might not realize the subtle proof we have in our heads currently. And I doubt that we have good tests that would catch such regressions, if the range was suddenly a little wider than before. src/hotspot/share/opto/intrinsicnode.cpp line 283: > 281: assert(mask_type->lo_as_long() >= 0, ""); > 282: // Here, result of clz is w.r.t to long argument, hence for integer argument > 283: // we explicitly subtract 32 from the result. The comments are not at the correct line. It does not seem to refer to the line above or below. I would say either move it up to the `Case 3` comments, or move it down to the specific line that actually subtracts `32`. src/hotspot/share/opto/intrinsicnode.cpp line 299: > 297: // in case input equals above estimated lower bound. > 298: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); > 299: hi = max_mask_bit_width < mask_bit_width ? (1L << max_mask_bit_width) - 1 : hi; Somehow the conversation disappeared with changes, so I'll screenshot it it here for context: ![image](https://github.com/user-attachments/assets/e4e5d05b-3fdd-4e67-ab0a-8164783648bc) > 'hi' is initialized to 'max_int' and nothing before line 297 is modifying 'hi'. That is not super apparent to me, at least not at first. I'd have to know the code really well. A quick scan gives me lines like `hi = (1UL << bitcount) - 1;` above. Sure, that is on an unrelated path, but I'd have to check like 20+ lines above. I don't have that attention span ;) So maybe some better naming here could help. Can you convert your explanations into comments in the code? Or even better asserts, so we are even more sure that it is correct now, and also once others make changes here that might invalidate your assumptions around `hi` and `lo`? test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 3: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > 3: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > Ah, I just noticed the test directory. I think we can put it in a more specific location. > There are operation specific tests under compiler/c2 let keep it this way. But this is about a Value optimization, which generally belong under the gvn directory. Just because we used to put all tests in one directory does not mean we should continue that practice ;) We should probably at some point clean up all the tests, and move them to better directories. Putting them already to better directories will reduce the backport issues later on, so please put it somewhere more specific. @jatin-bhateja I'm not very happy when people just "resolve" a conversation unilaterally. Can you please avoid that in the future? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2781004906 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052244785 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052259016 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052248407 From epeter at openjdk.org Mon Apr 21 11:02:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 11:02:04 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 06:52:04 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/intrinsicnode.cpp line 266: >> >>> 264: if ( opc == Op_CompressBits) { >>> 265: // Pattern: Integer/Long.compress(src_type, mask_type) >>> 266: int max_mask_bit_width; >> >> Suggestion: >> >> int result_bit_width; >> >> Is this bit width not about the result? It is really not about the mask. >> >> Example: >> `mask_type->hi_as_long() < -1L` >> >> Here, the mask has the uppermost bit set, and so the bit width of it is the maximum 32 / 64 bits. >> >> But we still can deduce that the result has one leading zero bit, and so the bit width of the result is either 31 or 63. > > Your interpretation is correct. But this is indeed the max_mask_bit_width, which constrains the upper bound of the result value. What I am saying is that calling it `max_mask_bit_width` seems **incorrect**. Your `Case 2` checks if the sign bit is set. In that case, `max_mask_bit_width` has to be `32` or `64`. But then you set it `max_mask_bit_width = mask_bit_width - 1;`. So now it would be `31` or `63`. That would be incorrect for the mask. But it would be correct for the result. >> src/hotspot/share/opto/intrinsicnode.cpp line 292: >> >>> 290: // compression result will never be a -ve value and we can safely set the >>> 291: // lower bound of the result value range to zero. >>> 292: lo = max_mask_bit_width == mask_bit_width ? lo : 0L; >> >> Can you please add an assert that we are not making `lo` worse than what we already have? Someone may insert optimizations above that set `lo > 0`, and then you may lower it again here. >> Suggestion: >> >> assert(lo < 0, "we should not lower the value of lo"); >> lo = max_mask_bit_width == mask_bit_width ? lo : 0L; > > We are already initializing 'lo' to min_jint/jlong value upfront and nothing before this line is modifying its value. Since initialization and use both are part of this function hense adding an assertion looks redundant. We generally add assertions to set constraints under which a logic is implemented. Your response is saying that the code is currently correct. That is true as far as I can see now. But my worry is about future changes. And an assert can also help the reader be sure that things happen correctly without having to verify manually, which is a lot of effort. So for me it would not be redundant. >> src/hotspot/share/opto/intrinsicnode.cpp line 391: >> >>> 389: return TypeInteger::zero(bt); >>> 390: } >>> 391: >> >> Is this change related to the PR title? And do you have any tests for it? > > Zero value transform is related to fix added new test points. Nit: It looks really like an extension rather than bug fix... or did we fold these before as well? Would you backport these changes? Since you added tests, I'm willing to let it go in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052242676 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052251238 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2052246850 From duke at openjdk.org Mon Apr 21 11:18:52 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 21 Apr 2025 11:18:52 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:30:56 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Remove unused code >> - Move code to addnode.cpp and add more tests >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 > > src/hotspot/share/opto/addnode.cpp line 800: > >> 798: * >> 799: * Because array value is logical and with constant 0xff/0xffff, LoadS/LoadB is converted to an unsigned load >> 800: * and 'And' node is eliminated in previous IGVN phase. Please check AndINode::Ideal for reference > > What do you mean by `array value is logical`? > Do you have tests where we do not have the `And` in the code, and there would be implicit sign extension? I mean `array value is masked with constant 0xff/0xffff`. I will change the comment. Tests without `And operator` will be added. I think the sign extension is checked before merging. > src/hotspot/share/opto/addnode.cpp line 828: > >> 826: }; >> 827: >> 828: typedef GrowableArray MergeLoadInfoList; > > Do you need to have this Resource allocated? Why not just have the elements in-place? And then make the `MergeLoadInfo` a `StackObj`. That would remove the extra pointer indirection. I choose `GrowableArray` for convenience. And it can be changed as a stack allocate data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2052273358 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2052274772 From epeter at openjdk.org Mon Apr 21 11:23:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 11:23:44 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: <3ZwBSGNwatEmf9zBAciFTFAMCingd1y6r6qQXnSBIw4=.fc72a7e0-a684-430c-8ee9-8c48001e0533@github.com> On Mon, 21 Apr 2025 11:15:41 GMT, kuaiwei wrote: >> src/hotspot/share/opto/addnode.cpp line 828: >> >>> 826: }; >>> 827: >>> 828: typedef GrowableArray MergeLoadInfoList; >> >> Do you need to have this Resource allocated? Why not just have the elements in-place? And then make the `MergeLoadInfo` a `StackObj`. That would remove the extra pointer indirection. > > I choose `GrowableArray` for convenience. And it can be changed as a stack allocate data. Ah, we have a misunderstanding. I was asking why not Suggestion: typedef GrowableArray MergeLoadInfoList; i.e. allocate the elements of the array directly in the array (in-place), rather than allocating separate elements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2052279839 From duke at openjdk.org Mon Apr 21 11:23:48 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 21 Apr 2025 11:23:48 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:58:24 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Remove unused code >> - Move code to addnode.cpp and add more tests >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 > > src/hotspot/share/opto/addnode.cpp line 912: > >> 910: return false; >> 911: } >> 912: return true; > > This feels inverted. I would do a switch case here anyway, and return `false` in the default case. > > Hmm. LoadS is supposed to be implicitly sign extended. Can those upper sign bits not lead to issues when OR-ing without a mask? The sign bit is checked in `MergePrimitiveLoad::collect_merge_list`. ```c++ // Check sign bit of load // For shifted value based on memory load, if it does not reach the sign bit of merged load, // the load must be an unsigned load if ((info->_shift + load->memory_size() * BitsPerByte) != (collected * load->memory_size() * BitsPerByte)) { if (!info->_load->is_unsigned()) { // no unsigned Load of LoadI, so LoadI can not be merged // we may check value, if it's greater than 0, it can be merged return; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2052279994 From epeter at openjdk.org Mon Apr 21 11:44:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 11:44:56 GMT Subject: RFR: 8354477: C2 SuperWord: make use of memory edges more explicit In-Reply-To: References: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> Message-ID: On Fri, 18 Apr 2025 13:34:37 GMT, Roland Westrelin wrote: >> This is a small refactoring, to make it more explicit that the "additional dependencies" are memory dependencies. >> >> This is also preparatory work for [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751), where we have two memory edge categories: strong and weak. >> >> I was able to pass around `vtn_dependencies` in fewer occasions, and also renamed it to `vtn_memory_dependencies`. > > Looks reasonable to me. @rwestrel @vnkozlov thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24613#issuecomment-2818245537 From epeter at openjdk.org Mon Apr 21 11:44:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Apr 2025 11:44:56 GMT Subject: Integrated: 8354477: C2 SuperWord: make use of memory edges more explicit In-Reply-To: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> References: <-srSeiLD0cMsEZMAjTccIavIlAu-RZ4emjJCg44-Zpk=.67af00d5-cf52-486b-9e14-ce2f8971c371@github.com> Message-ID: On Sun, 13 Apr 2025 13:40:13 GMT, Emanuel Peter wrote: > This is a small refactoring, to make it more explicit that the "additional dependencies" are memory dependencies. > > This is also preparatory work for [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751), where we have two memory edge categories: strong and weak. > > I was able to pass around `vtn_dependencies` in fewer occasions, and also renamed it to `vtn_memory_dependencies`. This pull request has now been integrated. Changeset: 4dd64b49 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/4dd64b49716144cc697fb461ff88860e2cbcaaea Stats: 75 lines in 5 files changed: 20 ins; 13 del; 42 mod 8354477: C2 SuperWord: make use of memory edges more explicit Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/24613 From qamai at openjdk.org Mon Apr 21 12:05:27 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 21 Apr 2025 12:05:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - reviews - Merge branch 'master' into unsignedbounds - refine comments - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - harden SimpleCanonicalResult - number lemmas - include - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=46 Stats: 2353 lines in 13 files changed: 1789 ins; 328 del; 236 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Mon Apr 21 12:05:27 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 21 Apr 2025 12:05:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> On Thu, 13 Feb 2025 13:16:01 GMT, Emanuel Peter wrote: >> @eme64 Ping > > @merykitty Sorry, I've been sick for a week and only just catching up with things again slowly... @eme64 It would be great if you can come back to this ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2818274924 From cslucas at openjdk.org Mon Apr 21 19:23:40 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 21 Apr 2025 19:23:40 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v3] In-Reply-To: References: Message-ID: On Sat, 19 Apr 2025 20:16:24 GMT, Doug Simon wrote: >> After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. This object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes an upper limit on the name value as this name should only be for informative purposes when inspecting compiled code. >> >> While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > InstalledCode.name can be null Thank you . LGTM test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/InstalledCodeTest.java line 25: > 23: > 24: /** > 25: * @test NIT: missing JBS entry ID. ------------- Marked as reviewed by cslucas (Committer). PR Review: https://git.openjdk.org/jdk/pull/24753#pullrequestreview-2782060782 PR Review Comment: https://git.openjdk.org/jdk/pull/24753#discussion_r2052888959 From liach at openjdk.org Mon Apr 21 20:12:19 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 21 Apr 2025 20:12:19 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: > In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Refine validation and defensive copying ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24777/files - new: https://git.openjdk.org/jdk/pull/24777/files/6e8b3254..e8adab3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=00-01 Stats: 9 lines in 1 file changed: 1 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777 PR: https://git.openjdk.org/jdk/pull/24777 From dnsimon at openjdk.org Mon Apr 21 20:38:35 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 21 Apr 2025 20:38:35 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v4] In-Reply-To: References: Message-ID: > After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. This object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes an upper limit on the name value as this name should only be for informative purposes when inspecting compiled code. > > While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: added bug id to @test tag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24753/files - new: https://git.openjdk.org/jdk/pull/24753/files/f7c01793..2b7a07de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24753&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24753/head:pull/24753 PR: https://git.openjdk.org/jdk/pull/24753 From dnsimon at openjdk.org Mon Apr 21 20:38:36 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 21 Apr 2025 20:38:36 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v3] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 19:20:29 GMT, Cesar Soares Lucas wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> InstalledCode.name can be null > > test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/InstalledCodeTest.java line 25: > >> 23: >> 24: /** >> 25: * @test > > NIT: missing JBS entry ID. It's not clear to me how useful that is as a single test file can be for numerous bugs over time and `git log` will show you all of them. That said, I've added the id as you request. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24753#discussion_r2052966796 From ecaspole at openjdk.org Mon Apr 21 22:13:55 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Mon, 21 Apr 2025 22:13:55 GMT Subject: RFR: 8355233: Add a DMB related benchmark Message-ID: In addition to the details in the JBS, I changed the name from DoubleDMB to DMBCheck. ------------- Commit messages: - 8355233: Add a DMB related benchmark Changes: https://git.openjdk.org/jdk/pull/24783/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24783&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355233 Stats: 102 lines in 1 file changed: 102 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24783/head:pull/24783 PR: https://git.openjdk.org/jdk/pull/24783 From ecaspole at openjdk.org Mon Apr 21 22:19:14 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Mon, 21 Apr 2025 22:19:14 GMT Subject: RFR: 8355233: Add a DMB related benchmark [v2] In-Reply-To: References: Message-ID: > In addition to the details in the JBS, I changed the name from DoubleDMB to DMBCheck. Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: Fix the copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24783/files - new: https://git.openjdk.org/jdk/pull/24783/files/0a3427e0..312cb317 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24783&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24783&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24783/head:pull/24783 PR: https://git.openjdk.org/jdk/pull/24783 From sparasa at openjdk.org Mon Apr 21 23:41:20 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 21 Apr 2025 23:41:20 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v9] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix emit_arith_ndd discrepancy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/1fa0fbe4..c2cd9c70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=07-08 Stats: 55 lines in 2 files changed: 16 ins; 26 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From never at openjdk.org Mon Apr 21 23:43:41 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 21 Apr 2025 23:43:41 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v4] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:38:35 GMT, Doug Simon wrote: >> After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. This object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes an upper limit on the name value as this name should only be for informative purposes when inspecting compiled code. >> >> While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > added bug id to @test tag Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24753#pullrequestreview-2782424720 From duke at openjdk.org Mon Apr 21 23:55:02 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 21 Apr 2025 23:55:02 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v12] In-Reply-To: References: Message-ID: <0TXIxgeQppMmgi8csVPUjNtFHj5QNqv6ZewL3eYc2TU=.89631480-490b-4fb7-882f-bb7be7f8e60d@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: - Add null check in StressNMethodRelocation - Hold Compile_lock - Use methodHandle for VM_Operation so pointer is not stale ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/27e41510..5552a860 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=10-11 Stats: 25 lines in 4 files changed: 11 ins; 5 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From sparasa at openjdk.org Tue Apr 22 00:14:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 22 Apr 2025 00:14:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v10] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: rename demote flag to optimize_rax_dst in emit_arith ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/c2cd9c70..e0653330 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=08-09 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From xgong at openjdk.org Tue Apr 22 01:44:40 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 22 Apr 2025 01:44:40 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Sun, 20 Apr 2025 03:28:48 GMT, SendaoYan wrote: >> ### Summary: >> [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. >> >> ### Background: >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). >> >> The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. >> >> Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: >> >> 1. `MaxVectorSize = 16, byte_vector_size = 16`: >> - Can load 4 indices per vector register >> - So can finish 4 bytes per gather-load operation >> - Requires 4 times of gather-loads and final merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] >> int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 4 gather-load: >> idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] >> idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] >> idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] >> idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] >> merge: v = [jlkp bhga cfhf becd] >> ``` >> >> 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: >> - Can load 8 indices per vector register >> - So can finish 8 bytes per gather-load operation >> - Requires 2 times of gather-loads and merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] >> int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 2 gather-load: >> idx_v1 = [2 5 7 5 1 4 2 3] >> idx_v2 = [9 11 10 15 1 7 6 0] >> gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] >> gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] >> merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] >> ``` >> >> 3. `MaxVectorSize = 64, byte_v... > > test/hotspot/jtreg/compiler/vectorapi/VectorGatherSubwordTest.java line 39: > >> 37: * @modules jdk.incubator.vector >> 38: * >> 39: * @run driver compiler.vectorapi.VectorGatherSubwordTest > > Should we use `@run main` instead of `@run driver` Thanks for taking a look at this PR! I think it's fine using `@run main` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24679#discussion_r2053187161 From aboldtch at openjdk.org Tue Apr 22 05:26:44 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Apr 2025 05:26:44 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Changes looks good. But coordinate with the Graal team before pushing anything. I think @dean-long's suggestion is good. But it should be done for all relocations in a separate PR. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2782793012 From dlunden at openjdk.org Tue Apr 22 05:44:50 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 22 Apr 2025 05:44:50 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: <7GXH8ZolWQgsN-p3-HxOH_qc180yQUaXwFB2Fl5TqEI=.7c4213e2-ee4c-44b6-9e78-8eb895b68a80@github.com> On Mon, 14 Oct 2024 14:17:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix uncommon_freq Keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2820130861 From fyang at openjdk.org Tue Apr 22 05:44:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 22 Apr 2025 05:44:51 GMT Subject: RFR: 8355239: RISC-V: Do not support subword scatter store Message-ID: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> Hi, please consider this small enhancement change. Currently, only word and double-word gather load and scatter store are supported on riscv. Both subword gather load and scatter store are not supported due to constraint of riscv vector extension. [JDK-8331150](https://bugs.openjdk.org/browse/JDK-8331150) makes this constraint explicit for subword gather load. For parity and consistency, this also makes it explicit for subword scatter store as well. Testing: `jdk_vector` tested with QEMU vector extension. ------------- Commit messages: - 8355239: RISC-V: Do not support subword scatter store Changes: https://git.openjdk.org/jdk/pull/24787/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24787&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355239 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24787.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24787/head:pull/24787 PR: https://git.openjdk.org/jdk/pull/24787 From dnsimon at openjdk.org Tue Apr 22 07:10:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Apr 2025 07:10:57 GMT Subject: Integrated: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 13:27:02 GMT, Doug Simon wrote: > After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. This object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes an upper limit on the name value as this name should only be for informative purposes when inspecting compiled code. > > While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. This pull request has now been integrated. Changeset: 2f7806ff Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/2f7806ffe5b5b4b2f7caa14d4559943968c34678 Stats: 83 lines in 4 files changed: 80 ins; 0 del; 3 mod 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 Reviewed-by: never, yzheng, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/24753 From dnsimon at openjdk.org Tue Apr 22 07:10:56 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Apr 2025 07:10:56 GMT Subject: RFR: 8355034: [JVMCI] assert(static_cast(_jvmci_data_size) == align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)) failed: failed: 104 != 16777320 [v4] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:38:35 GMT, Doug Simon wrote: >> After [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789), the size of a `JVMCINMethodData` object is limited to `uint16_t`. This object embeds the value of `InstalledCode.name` so effectively imposes a limit on the name length. This PR establishes an upper limit on the name value as this name should only be for informative purposes when inspecting compiled code. >> >> While debugging the problem that exposed the limit, it was confusing that `-XX:+PrintCompilation` did not show the name so this PR builds on [JDK-8336760](https://bugs.openjdk.org/browse/JDK-8336760) to add the name in `PrintCompilation` output for JVMCI "hosted" methods. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > added bug id to @test tag Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24753#issuecomment-2820289421 From chagedorn at openjdk.org Tue Apr 22 07:45:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Apr 2025 07:45:58 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:23:20 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > WIP: review followups Was out last week but I'm seeing your last commit mentions WIP. Let me know when it's ready to have another look again :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-2820447750 From mchevalier at openjdk.org Tue Apr 22 08:14:05 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 22 Apr 2025 08:14:05 GMT Subject: RFR: 8320909: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" Message-ID: The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: - Test, TestSimple: Disappeared with: [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. Reverting this fix makes the issue reappear. - Reduced2: I fix here - Test3, Reduced3: Disappeared with: [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) The issue comes from the fact that `And[IL]Node::Value` has a special handling when an operand is a left-shift: in the expression lhs & (X << s) if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, updating the Shift node during IGVN won't enqueue directly the And node, but only the Conv node. If this conv node cannot be improved, the And node is not enqueued, and its type is not as good as it could be. Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a dead branch, so the node is about to be eleminated. On the second figure, we can see `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN to make sure it has a chance to be refined. The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and the `LShiftINode`. The fix has to take that into account. Overall, the situation can be of the form: LShift -> Cast+ -> ConvI2L -> Cast+ -> And This second case was shadowed by [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459) because the structure And(constant, Cast(LShift(LShift(X, 16), 16))) that appears in the example is simplified into And(constant, Cast(0)) which enqueue the Cast for IGVN, but in this form it can improve its type: it's surely 0, so the And node is also processed to deduce the value 0 as well. I tried and failed to create a reproducer of this case that isn't shadowed by [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459); in particular changing the constant from 16 to 15, for instance, doesn't allow to simplify the double shift into 0, but doesn't reproduce the issue. One could imagine an alternate solution that would not update the notification system but rather the type system. On top of intervals, we could have some modulo information and know that some value is 0 modulo 2^s, or alternatively, some bitwise information that would know the lower `s` bits to be 0 (and the top bits to be unknown). This would propagate through casts and convs and join through phis. This would allow the And node to have best type earlier on. If the information is already there (like on the first figure). Also, the And node would not need to look for deep structures, but only compare the modulo/bitwise information of operands to know if the value can be refined, which means the enqueuing code would not need to be changed, and more cases would naturally work without special pattern-matching from And node. Nevertheless, this solution feels way out-of-scope... Thanks, Marc ------------- Commit messages: - Add timeout in test - Looking through casts - Add -XX:+IgnoreUnrecognizedVMOptions - Add And to worklist through ConvI2L Changes: https://git.openjdk.org/jdk/pull/24792/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24792&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320909 Stats: 162 lines in 3 files changed: 162 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24792/head:pull/24792 PR: https://git.openjdk.org/jdk/pull/24792 From chagedorn at openjdk.org Tue Apr 22 08:30:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Apr 2025 08:30:54 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v6] In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 14:08:11 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Looks good to me, too. Nit: You should probably update the title: "divisor not null" -> "divisor not zero". src/hotspot/share/opto/loopnode.cpp line 6072: > 6070: // which is be transformed to (3): > 6071: // (AddI pre_loop_iv -1) > 6072: // The compiler may be able to constant fold the assert predicate condition for (3) but not (1) Suggestion: // skip over the cast added by PhaseIdealLoop::cast_incr_before_loop() when pre/post/main loops are created because // it can get in the way of type propagation. For instance, the index tested by an Assertion Predicate, if the cast is // not skipped over, could be (1): // (AddI (CastII (AddI pre_loop_iv -2) int) 1) // while without the cast, it is (2): // (AddI (AddI pre_loop_iv -2) 1) // which is be transformed to (3): // (AddI pre_loop_iv -1) // The compiler may be able to constant fold the Assertion Predicate condition for (3) but not (1) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23617#pullrequestreview-2783275896 PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2053606293 From epeter at openjdk.org Tue Apr 22 08:46:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 08:46:04 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 16:41:48 GMT, Dhamoder Nalla wrote: >> According to the latest comments on bug JDK-8315916, two more people have reported the issue. > >> @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) > > @eme64, This is ready for review. @dhanalla Do you want us to continue reviewing? It is usually good to ping people again after making changes. Otherwise, we don't know if you are still working on it and we should wait. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2820601943 From mli at openjdk.org Tue Apr 22 08:47:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Apr 2025 08:47:52 GMT Subject: RFR: 8355239: RISC-V: Do not support subword scatter store In-Reply-To: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> References: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> Message-ID: <_tsC73XPm2wBPccsOS37n_8Oz7P84WBLsQiiqJ2r3g4=.24416039-207f-46f8-9be1-c8036144e8d9@github.com> On Tue, 22 Apr 2025 02:47:02 GMT, Fei Yang wrote: > Hi, please consider this small enhancement change. > > Currently, only word and double-word gather load and scatter store are supported on riscv. > Both subword gather load and scatter store are not supported due to constraint of riscv vector extension. > [JDK-8331150](https://bugs.openjdk.org/browse/JDK-8331150) makes this constraint explicit for subword gather load. > For parity and consistency, this also makes it explicit for subword scatter store as well. > > Testing: `jdk_vector` tested with QEMU vector extension. Looks good. Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24787#pullrequestreview-2783329801 From duke at openjdk.org Tue Apr 22 08:55:43 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 22 Apr 2025 08:55:43 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: <56jHI-HsjxkhkEQ5Dciu-thfFFPCErUEdzuzHr5k7HA=.168c6cdc-52b5-4c96-9068-feb00f621e61@github.com> On Thu, 17 Apr 2025 10:12:25 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/addnode.cpp line 950: >> >>> 948: const Node* check = by_pass_i2l(_combine); >>> 949: if (check->outcnt() == 1 && check->unique_out()->Opcode() == _combine->Opcode()) { >>> 950: // It's in the middle of combine operators >> >> Ah, I see. This is about checking that there is nothing further down to merge with.... >> >> But are you sure this is a good idea to check that there is no `OR` below? I mean there could be valid uses of `OrI` below that do some OR-ing with something else... Now you are forbidding these cases. >> >> Look at this: >> >> int x = (... merge load pattern with OR ...); >> int y = ... some other value ... >> int z = x | y; >> >> >> I would expect that we could merge the loads here too, but your pattern matching seems to forbid it, right? > > Also: I would change the name of the method if we keep it. It is really about checking down, i.e. that there is not other candidate below. Maybe `has_no_merge_load_combine_below`? > > Ah, another example: We could have two merged loads that we OR: > > int x = (... merge load pattern with OR ...); > int y = (... merge load pattern with OR ...); > int z = x | y; > > It would be nice if this could be optimized too, and we should have an IR test for it :) Yes, it's the limit of this implementation. I need to find the last `combine` node which can be replaced with merged load. But if it's used by other `Or` operator. So far I can not find a good way to distinguish these two cases. I may add a new `checked` flag for combine operator. For case like: int x = (... merge load pattern with OR ...); int y = (... merge load pattern with OR ...); int z = x | y; When IGVN check the `Or` in `x | y`, it's the last one of combine nodes. But it will fail to merge because `collect_merge_list` can not find a related `Load` for it. And I can mark it as `checked`. So when IGVN check the `Or` nodes in line 1 and line2. it will find the next `Or` is checked and get the right one. Do you think if it is doable? Other suggestion is appreciated. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2053662360 From epeter at openjdk.org Tue Apr 22 09:03:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 09:03:59 GMT Subject: RFR: 8320909: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 08:06:48 GMT, Marc Chevalier wrote: > One could imagine an alternate solution that would not update the notification system but rather the type system. On top of intervals, we could have some modulo information and know that some value is 0 modulo 2^s, or alternatively, some bitwise information that would know the lower s bits to be 0 (and the top bits to be unknown). @marc-chevalier FYI: https://github.com/openjdk/jdk/pull/17508 this could be an alternative solution, at least once we fully follow that road. It tracks the "known bits", which would give you the "modulo information". ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2820648102 From epeter at openjdk.org Tue Apr 22 09:11:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 09:11:05 GMT Subject: RFR: 8320909: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 08:06:48 GMT, Marc Chevalier wrote: > The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: > > - Test, TestSimple: > Disappeared with: > [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 > which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. > Reverting this fix makes the issue reappear. > - Reduced2: I fix here > - Test3, Reduced3: > Disappeared with: > [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 > which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) > > The issue comes from the fact that `And[IL]Node::Value` has a special handling when > an operand is a left-shift: in the expression > > lhs & (X << s) > > if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling > also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, > updating the Shift node during IGVN won't enqueue directly the And node, but only the > Conv node. If this conv node cannot be improved, the And node is not enqueued, and its > type is not as good as it could be. > > Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a > dead branch, so the node is about to be eleminated. On the second figure, we can see > `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens > during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. > > > > > > The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN > to make sure it has a chance to be refined. > > The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes > also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and > the `LShiftINode`. The fix has to take that into account. > > > > > Overall, the situation can be of the form: > > LShift -> Cast+ -> ConvI2L -> Cast+ -> And > > This second case was shadowed by [JDK-8347459](https://bugs... @marc-chevalier Thanks for looking into this! I think the solution looks good :) Did you put in all 3 reproducers, even the ones that do not reproduce any more? It could be interesting if we ever backport, and maybe also more generally because it generates interesting patterns. @marc-chevalier I also wonder if we could not have a more descriptive title? BTW: **thank you** for the detailed PR description, very helpful ? test/hotspot/jtreg/compiler/c2/gvn/MissingOptWithShiftConvAnd.java line 40: > 38: * -XX:-TieredCompilation -Xbatch > 39: * -XX:+IgnoreUnrecognizedVMOptions -XX:VerifyIterativeGVN=10 > 40: * MissingOptWithShiftConvAnd It could be good to have a run without flags. Or at lest with fewer flags. Just so we can run it with other flag combinations. Imagine we run it with `-XX:VerifyIterativeGVN=11` from the outside, that would just get downgraded to `-XX:VerifyIterativeGVN=10` immediately here. test/hotspot/jtreg/compiler/c2/gvn/MissingOptWithShiftConvCastAnd.java line 34: > 32: * -XX:-TieredCompilation -Xbatch > 33: * -XX:+IgnoreUnrecognizedVMOptions -XX:VerifyIterativeGVN=10 > 34: * MissingOptWithShiftConvCastAnd Same here :) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24792#pullrequestreview-2783397725 PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2820672882 PR Review Comment: https://git.openjdk.org/jdk/pull/24792#discussion_r2053680291 PR Review Comment: https://git.openjdk.org/jdk/pull/24792#discussion_r2053680921 From epeter at openjdk.org Tue Apr 22 09:15:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 09:15:10 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 21:46:26 GMT, Saranya Natarajan wrote: >> Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. >> >> Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments src/hotspot/share/opto/escape.cpp line 2822: > 2820: Node* value = nullptr; > 2821: if (ini != nullptr) { > 2822: //StoreP::value_basic_type() == T_ADDRESS Suggestion: // StoreP::value_basic_type() == T_ADDRESS Nit: generally we have a space after the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24427#discussion_r2053696696 From mchevalier at openjdk.org Tue Apr 22 09:22:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 22 Apr 2025 09:22:45 GMT Subject: RFR: 8320909: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 08:06:48 GMT, Marc Chevalier wrote: > The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: > > - Test, TestSimple: > Disappeared with: > [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 > which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. > Reverting this fix makes the issue reappear. > - Reduced2: I fix here > - Test3, Reduced3: > Disappeared with: > [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 > which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) > > The issue comes from the fact that `And[IL]Node::Value` has a special handling when > an operand is a left-shift: in the expression > > lhs & (X << s) > > if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling > also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, > updating the Shift node during IGVN won't enqueue directly the And node, but only the > Conv node. If this conv node cannot be improved, the And node is not enqueued, and its > type is not as good as it could be. > > Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a > dead branch, so the node is about to be eleminated. On the second figure, we can see > `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens > during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. > > > > > > The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN > to make sure it has a chance to be refined. > > The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes > also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and > the `LShiftINode`. The fix has to take that into account. > > > > > Overall, the situation can be of the form: > > LShift -> Cast+ -> ConvI2L -> Cast+ -> And > > This second case was shadowed by [JDK-8347459](https://bugs... I've put only the (reduced) reproducers that I fix here (so 2 cases), including one that doesn't work anymore. I've not put the `Test`/`TestSimple` one, the one that is properly fixed by another PR, and is quite a different issue actually, simply hitting the same assert. Sure, I can add it. I just feared it would be confusing when browsing the history, to find a test actually fixed by another change than the one we will find in git blame. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2820705595 From thartmann at openjdk.org Tue Apr 22 10:51:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Apr 2025 10:51:56 GMT Subject: RFR: 8320909: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 09:19:59 GMT, Marc Chevalier wrote: > I just feared it would be confusing when browsing the history, to find a test actually fixed by another change than the one we will find in git blame. I think that's fine - just make sure that you reference the right bug via the `@bug` tag in the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2820932781 From duke at openjdk.org Tue Apr 22 11:06:13 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 22 Apr 2025 11:06:13 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v13] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Fix for comments - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Remove unused code - Move code to addnode.cpp and add more tests - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Fix test - Add more tests - Enable StressIGVN and riscv platform - Change tests as review comments - ... and 6 more: https://git.openjdk.org/jdk/compare/128f2d1c...a35d96e5 ------------- Changes: https://git.openjdk.org/jdk/pull/24023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=12 Stats: 2549 lines in 17 files changed: 2500 ins; 0 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From chagedorn at openjdk.org Tue Apr 22 11:43:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Apr 2025 11:43:51 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v4] In-Reply-To: <3AsN87ob_1V_i0mIHo96_3IZ2p5QY6x_B2P_Ydsl_7A=.13a3ac28-bf3a-4941-be1c-c010dc48a251@github.com> References: <3AsN87ob_1V_i0mIHo96_3IZ2p5QY6x_B2P_Ydsl_7A=.13a3ac28-bf3a-4941-be1c-c010dc48a251@github.com> Message-ID: <85m-gBWaSRRZpZiEZc9ajotIYZD3MAtInLkgQ0Icw2g=.c83cde95-1847-4a61-9bef-8d39b76ec0b3@github.com> On Thu, 17 Apr 2025 06:23:33 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Make test random > - Add test for predicate cloning on uncounted loops > - Add test case for predication before unswitching > - Test predicate cloning only before loop predication > > Thus, we do not see the predicates in the loop selector that cloning > actually killed. > - Clone loop limit predicates for uncounted loops > > When unswitching uncounted loops we have to clone loop limit checks > because we do not have information on the behavior of the loop index > - Do not clone loop limit checks in loop unswitching > - Add suggested comment > > Co-authored-by: Christian Hagedorn > - Remove -Xcomp and replace with Warmup(0) > - ir-framework: use new before/after loop opts phases > - Add IR test for predicate cloning > - ... and 3 more: https://git.openjdk.org/jdk/compare/465c8e65...17b32db4 Thanks for the update and fixing the found cloning bug! A few mostly minor comments, otherwise, it looks good! src/hotspot/share/opto/predicates.cpp line 1103: > 1101: // Does not clone Loop Limit Check parse predicates iff a counted loop is unswitched, because a loop limit check before > 1102: // the unswitched loop selector covers both unswitched counted loops. Otherwise, we would need to hoist the loop limit > 1103: // checks from both loops back up to the loop selector. I suggest the following rephrasing to highlight why cloning the Parse Predicate is not useful (note that Loop Unswitching does not clone the actual Loop Limit Check but rather the Parse Predicate which is a placeholder for a potential Loop Limit Check later). Suggestion: // Does not clone a Loop Limit Check Parse Predicate if a counted loop is unswitched, because it most likely will not be used anymore (it could only be used when both unswitched loop versions die and the Loop Limit Check Parse Predicate ends up at a LoopNode without Loop Limit Check Parse Predicate directly following the unswitched loop that can then be speculatively converted to a counted loop - this is rather rare). src/hotspot/share/opto/predicates.hpp line 1179: > 1177: > 1178: PhaseIdealLoop* const _phase; > 1179: bool const _is_counted_loop; We usually have them swapped for constants: Suggestion: const bool _is_counted_loop; test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2826: > 2824: /** > 2825: * Apply {@code regex} on all ideal graph phases starting from {@link CompilePhase#BEFORE_LOOP_OPTS} > 2826: * up to and including {@link CompilePhase#AFTER_LOOP_OPTS} Suggestion: * up to and including {@link CompilePhase#AFTER_LOOP_OPTS}. test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 68: > 66: arr[i] = i; > 67: } > 68: Suggestion: test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 81: > 79: IRNode.AUTO_VECTORIZATION_CHECK_PARSE_PREDICATE, "3" }, > 80: phase = CompilePhase.BEFORE_LOOP_UNSWITCHING) > 81: // Since we know that predication happens after unswitching, we can test the I suggest to capitalized the optimization and predicate names: Loop Unswitching, Loop Predication, Template Assertion Predicate etc. to better highlight them. Suggestion: // Since we know that Loop Predication happens after Loop Unswitching, we can test the test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 82: > 80: phase = CompilePhase.BEFORE_LOOP_UNSWITCHING) > 81: // Since we know that predication happens after unswitching, we can test the > 82: // predicate cloning before predication, such that the useless, killed predicates Suggestion: // predicate cloning before Loop Predication, such that the useless, killed predicates test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 88: > 86: IRNode.LOOP_LIMIT_CHECK_PARSE_PREDICATE, "3", > 87: IRNode.AUTO_VECTORIZATION_CHECK_PARSE_PREDICATE, "4" }, > 88: phase = CompilePhase.BEFORE_LOOP_PREDICATION_IC) I suggest to match on RC to have matching before/after RC pairs Suggestion: phase = CompilePhase.BEFORE_LOOP_PREDICATION_RC) test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 89: > 87: IRNode.AUTO_VECTORIZATION_CHECK_PARSE_PREDICATE, "4" }, > 88: phase = CompilePhase.BEFORE_LOOP_PREDICATION_IC) > 89: // Check that opaque template assertion predicated are added in loop predication Suggestion: // Check that Template Assertion Predicates are added in Loop Predication test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 90: > 88: phase = CompilePhase.BEFORE_LOOP_PREDICATION_IC) > 89: // Check that opaque template assertion predicated are added in loop predication > 90: // even if loop predication only happens after loop unswitching. Suggestion: // even if Loop Predication only happens after Loop Unswitching. test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 123: > 121: @Test > 122: // Check that Loop Unswitching doubled the number of parse and tempalte > 123: // assertion predicates. Again, the Loop Limit Check Parse Predicate Suggestion: // Check that Loop Unswitching doubled the number of Parse and Template // Assertion Predicates. Again, the Loop Limit Check Parse Predicate test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 133: > 131: IRNode.AUTO_VECTORIZATION_CHECK_PARSE_PREDICATE, "1" }, > 132: phase = CompilePhase.BEFORE_LOOP_UNSWITCHING) > 133: // After loop unswitching and after removing the killed predicates. Suggestion: // After Loop Unswitching and after removing the killed predicates. test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 139: > 137: IRNode.LOOP_LIMIT_CHECK_PARSE_PREDICATE, "1", > 138: IRNode.AUTO_VECTORIZATION_CHECK_PARSE_PREDICATE, "2" }, > 139: phase = CompilePhase.BEFORE_LOOP_PREDICATION_IC) This suggests that we apply another round of Loop Predication later? It might be confused with the first application of Loop Predication. Could we also match on `PHASEIDEALLOOP_ITERATIONS` instead? Would that work? test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 154: > 152: @Test > 153: // Check that Loop Unswitching doubled the number of all parse predicates. > 154: // Since this is not counted loop, the Loop Limit Check parse predicate Suggestion: // Check that Loop Unswitching doubled the number of all Parse Predicates. // Since this is not counted loop, the Loop Limit Check Parse Predicate test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 162: > 160: phase = CompilePhase.BEFORE_LOOP_UNSWITCHING) > 161: // After loop unswitching and after removing the killed predicates all > 162: // parse predicates are doubled.. Suggestion: // After Loop Unswitching and after removing the killed predicates all // Parse Predicates are doubled. ------------- PR Review: https://git.openjdk.org/jdk/pull/24479#pullrequestreview-2783752093 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053917190 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053899717 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053901193 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053921318 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053922091 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053922438 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053926155 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053924834 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053925405 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053927678 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053932438 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053935490 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053930780 PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2053936490 From mli at openjdk.org Tue Apr 22 11:59:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Apr 2025 11:59:55 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests Message-ID: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Hi, Can you help to review this simple patch? It just enables some test to run on riscv. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24797/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24797&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355293 Stats: 41 lines in 5 files changed: 12 ins; 6 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24797.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24797/head:pull/24797 PR: https://git.openjdk.org/jdk/pull/24797 From dfenacci at openjdk.org Tue Apr 22 12:37:22 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 22 Apr 2025 12:37:22 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) Message-ID: After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. ### Testing Tier 1-3. No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). ------------- Commit messages: - JDK-8354119: remove unused import - JDK-8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) Changes: https://git.openjdk.org/jdk/pull/24549/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24549&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354119 Stats: 36 lines in 8 files changed: 36 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24549.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24549/head:pull/24549 PR: https://git.openjdk.org/jdk/pull/24549 From duke at openjdk.org Tue Apr 22 12:56:30 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 22 Apr 2025 12:56:30 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v5] In-Reply-To: References: Message-ID: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. > Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. > > All changes summarized: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops, > - only clone loop limit checks when unswitching uncounted loops, > - add a test which checks that loop limit checks are not cloned when unswitching counted loops, > - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply review comments Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24479/files - new: https://git.openjdk.org/jdk/pull/24479/files/17b32db4..d50f5ac9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=03-04 Stats: 18 lines in 3 files changed: 1 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From duke at openjdk.org Tue Apr 22 12:59:30 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 22 Apr 2025 12:59:30 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v6] In-Reply-To: References: Message-ID: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. > Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. > > All changes summarized: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops, > - only clone loop limit checks when unswitching uncounted loops, > - add a test which checks that loop limit checks are not cloned when unswitching counted loops, > - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply missing suggestion Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24479/files - new: https://git.openjdk.org/jdk/pull/24479/files/d50f5ac9..3e636de7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From duke at openjdk.org Tue Apr 22 13:28:58 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 22 Apr 2025 13:28:58 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v6] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 12:59:30 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply missing suggestion > > Co-authored-by: Christian Hagedorn All your comments and suggestions should be addressed. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24479#issuecomment-2821330249 From duke at openjdk.org Tue Apr 22 13:28:58 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 22 Apr 2025 13:28:58 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v7] In-Reply-To: References: Message-ID: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. > Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. > > All changes summarized: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops, > - only clone loop limit checks when unswitching uncounted loops, > - add a test which checks that loop limit checks are not cloned when unswitching counted loops, > - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Test with different phase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24479/files - new: https://git.openjdk.org/jdk/pull/24479/files/3e636de7..8f224106 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24479&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24479.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24479/head:pull/24479 PR: https://git.openjdk.org/jdk/pull/24479 From duke at openjdk.org Tue Apr 22 13:28:58 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 22 Apr 2025 13:28:58 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v4] In-Reply-To: <85m-gBWaSRRZpZiEZc9ajotIYZD3MAtInLkgQ0Icw2g=.c83cde95-1847-4a61-9bef-8d39b76ec0b3@github.com> References: <3AsN87ob_1V_i0mIHo96_3IZ2p5QY6x_B2P_Ydsl_7A=.13a3ac28-bf3a-4941-be1c-c010dc48a251@github.com> <85m-gBWaSRRZpZiEZc9ajotIYZD3MAtInLkgQ0Icw2g=.c83cde95-1847-4a61-9bef-8d39b76ec0b3@github.com> Message-ID: On Tue, 22 Apr 2025 11:39:57 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Make test random >> - Add test for predicate cloning on uncounted loops >> - Add test case for predication before unswitching >> - Test predicate cloning only before loop predication >> >> Thus, we do not see the predicates in the loop selector that cloning >> actually killed. >> - Clone loop limit predicates for uncounted loops >> >> When unswitching uncounted loops we have to clone loop limit checks >> because we do not have information on the behavior of the loop index >> - Do not clone loop limit checks in loop unswitching >> - Add suggested comment >> >> Co-authored-by: Christian Hagedorn >> - Remove -Xcomp and replace with Warmup(0) >> - ir-framework: use new before/after loop opts phases >> - Add IR test for predicate cloning >> - ... and 3 more: https://git.openjdk.org/jdk/compare/465c8e65...17b32db4 > > test/hotspot/jtreg/compiler/loopopts/TestUnswitchPredicateCloning.java line 139: > >> 137: IRNode.LOOP_LIMIT_CHECK_PARSE_PREDICATE, "1", >> 138: IRNode.AUTO_VECTORIZATION_CHECK_PARSE_PREDICATE, "2" }, >> 139: phase = CompilePhase.BEFORE_LOOP_PREDICATION_IC) > > This suggests that we apply another round of Loop Predication later? It might be confused with the first application of Loop Predication. Could we also match on `PHASEIDEALLOOP_ITERATIONS` instead? Would that work? `PHASEIDEALLOOP_ITERATIONS` does not work reliably. I decided to go with `PHASEIDEALLOOP2`, which does work reliably. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24479#discussion_r2054109164 From fjiang at openjdk.org Tue Apr 22 13:54:43 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 22 Apr 2025 13:54:43 GMT Subject: RFR: 8355239: RISC-V: Do not support subword scatter store In-Reply-To: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> References: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> Message-ID: On Tue, 22 Apr 2025 02:47:02 GMT, Fei Yang wrote: > Hi, please consider this small enhancement change. > > Currently, only word and double-word gather load and scatter store are supported on riscv. > Both subword gather load and scatter store are not supported due to constraint of riscv vector extension. > [JDK-8331150](https://bugs.openjdk.org/browse/JDK-8331150) makes this constraint explicit for subword gather load. > For parity and consistency, this also makes it explicit for subword scatter store as well. > > Testing: `jdk_vector` tested with QEMU vector extension. Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24787#pullrequestreview-2784200964 From luhenry at openjdk.org Tue Apr 22 14:48:48 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 22 Apr 2025 14:48:48 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:03:47 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into vector.math.01.java > - RVV and SVE adjustments > - fix broken merge > - Merge branch 'master' into vector.math.01.java > - Fix debugName handling > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - ... and 14 more: https://git.openjdk.org/jdk/compare/63ffaec1...88eacc48 src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 75: > 73: return switch (StaticProperty.osArch()) { > 74: case "amd64", "x86_64" -> SVML; > 75: case "aarch64" -> SLEEF; We should be supporting SLEEF on `riscv64`. Was there a specific motivation not to include it here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054281193 From dhanalla at openjdk.org Tue Apr 22 14:52:49 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 22 Apr 2025 14:52:49 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 16:41:48 GMT, Dhamoder Nalla wrote: >> According to the latest comments on bug JDK-8315916, two more people have reported the issue. > >> @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) > > @eme64, This is ready for review. > @dhanalla Do you want us to continue reviewing? It is usually good to ping people again after making changes. Otherwise, we don't know if you are still working on it and we should wait. @eme64 yes, please review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2821589306 From luhenry at openjdk.org Tue Apr 22 14:57:46 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 22 Apr 2025 14:57:46 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests In-Reply-To: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: On Tue, 22 Apr 2025 11:54:58 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > It just enables some test to run on riscv. > > Thanks Marked as reviewed by luhenry (Committer). test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckHoistingScaledIV.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2025, Arm Limited. All rights reserved. Leave that copyright untouched but add a Rivos line instead. test/hotspot/jtreg/compiler/vectorization/runner/ArrayShiftOpTest.java line 3: > 1: /* > 2: * Copyright (c) 2022, 2023, Arm Limited. All rights reserved. > 3: * Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved. Leave that copyright untouched but add a Rivos line instead. ------------- PR Review: https://git.openjdk.org/jdk/pull/24797#pullrequestreview-2784420305 PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2054301040 PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2054300845 From mli at openjdk.org Tue Apr 22 15:09:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Apr 2025 15:09:59 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: > Hi, > Can you help to review this simple patch? > It just enables some test to run on riscv. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24797/files - new: https://git.openjdk.org/jdk/pull/24797/files/632b9b1f..eac56a87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24797&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24797&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24797.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24797/head:pull/24797 PR: https://git.openjdk.org/jdk/pull/24797 From mli at openjdk.org Tue Apr 22 15:10:00 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Apr 2025 15:10:00 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: On Tue, 22 Apr 2025 14:55:12 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> copyright > > test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckHoistingScaledIV.java line 2: > >> 1: /* >> 2: * Copyright (c) 2022, 2025, Arm Limited. All rights reserved. > > Leave that copyright untouched but add a Rivos line instead. Thanks! It's updated. > test/hotspot/jtreg/compiler/vectorization/runner/ArrayShiftOpTest.java line 3: > >> 1: /* >> 2: * Copyright (c) 2022, 2023, Arm Limited. All rights reserved. >> 3: * Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved. > > Leave that copyright untouched but add a Rivos line instead. Thanks! It's updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2054322737 PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2054322529 From epeter at openjdk.org Tue Apr 22 15:11:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:11:47 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 14:49:39 GMT, Dhamoder Nalla wrote: >>> @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) >> >> @eme64, This is ready for review. > >> @dhanalla Do you want us to continue reviewing? It is usually good to ping people again after making changes. Otherwise, we don't know if you are still working on it and we should wait. > @eme64 yes, please review. @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2821639698 From epeter at openjdk.org Tue Apr 22 15:11:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:11:48 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v10] In-Reply-To: References: Message-ID: <8rxtRaj5bDfP2SOjgjGzgjW_qz1Sz4Vyn39awG9yRfY=.02f20d32-633c-412b-8c9d-ff48805393cc@github.com> On Wed, 9 Apr 2025 18:17:57 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with two additional commits since the last revision: > > - reduce array/node size limts and remove the timeout > - reduce array/node size limts and remove the timeout FYI: github actions tells me that your added regression test is failing `TestScalarizeBailout.java`, are you aware of that? https://github.com/dhanalla/jdk/actions/runs/14364146583/job/40274556990 # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/opto/node.cpp:78), pid=8112, tid=8129 # assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-dhanalla-a8cb47d6f392524d5b96528e398ebfc15847b693) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-dhanalla-a8cb47d6f392524d5b96528e398ebfc15847b693, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1490e77] Node::verify_construction()+0x1a7 # # CreateCoredumpOnCrash turned off, no core file dumped # # An error report file with more information is saved as: # /home/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_compiler_2/scratch/0/hs_err_pid8112.log # # Compiler replay data is saved as: # /home/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_compiler_2/scratch/0/replay_pid8112.log ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2821646416 From epeter at openjdk.org Tue Apr 22 15:15:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:15:48 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v10] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:17:57 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with two additional commits since the last revision: > > - reduce array/node size limts and remove the timeout > - reduce array/node size limts and remove the timeout src/hotspot/share/opto/macro.cpp line 830: > 828: } > 829: return nullptr; > 830: }*/ Ah, it looks like you commented out your fix. That would explain why the GitHub Actions tests are failing ;) test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 34: > 32: * -XX:CompileCommand=dontinline,compiler.escapeAnalysis.TestScalarizeBailout::initializeArray > 33: * -XX:CompileCommand=compileonly,compiler.escapeAnalysis.TestScalarizeBailout::* > 34: * compiler.escapeAnalysis.TestScalarizeBailout Could you please add an additional run with fewer flags? That would allow us to run this test from the outside with for example `-XX:MaxNodeLimit=10000`, and it would not get instantly overwritten by your `-XX:MaxNodeLimit=20000` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2054332200 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r2054330655 From epeter at openjdk.org Tue Apr 22 15:22:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:22:57 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 14:49:39 GMT, Dhamoder Nalla wrote: >>> @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) >> >> @eme64, This is ready for review. > >> @dhanalla Do you want us to continue reviewing? It is usually good to ping people again after making changes. Otherwise, we don't know if you are still working on it and we should wait. > @eme64 yes, please review. @dhanalla You now first perform the transformation, and then fail because there are too many nodes. At that point, the compilation is basically in a bad state and cannot be finished. Is there no alternative where we could check first if we would exceed the node limit, and then just avoid scalarizing such very large arrays, but continue with the compilation? It seems to be a shame to give up on escape analysis entirely, if it maybe was just a single array. But maybe @vnkozlov should say more about this, I only have a very rudimentary understanding of escape analysis. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2821679100 From epeter at openjdk.org Tue Apr 22 15:25:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:25:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> References: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> Message-ID: On Mon, 21 Apr 2025 12:02:08 GMT, Quan Anh Mai wrote: >> @merykitty Sorry, I've been sick for a week and only just catching up with things again slowly... > > @eme64 It would be great if you can come back to this @merykitty Oh dear, I dropped it again. Thanks for the reminder! I actually just thought about this one over the easter weekend. And it seems to me we have had lots of "bit optimizations" that could be much more powerfully solved with "known bits". So let's continue working on this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2821685084 From kvn at openjdk.org Tue Apr 22 15:33:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 15:33:46 GMT Subject: RFR: 8355233: Add a DMB related benchmark [v2] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 22:19:14 GMT, Eric Caspole wrote: >> In addition to the details in the JBS, I changed the name from DoubleDMB to DMBCheck. > > Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: > > Fix the copyright header The comment explaining what is measured would be nice. ------------- PR Review: https://git.openjdk.org/jdk/pull/24783#pullrequestreview-2784527365 From epeter at openjdk.org Tue Apr 22 15:52:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:52:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 12:05:27 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: > > - Merge branch 'master' into unsignedbounds > - Merge branch 'master' into unsignedbounds > - reviews > - Merge branch 'master' into unsignedbounds > - refine comments > - Merge branch 'master' into unsignedbounds > - Merge branch 'master' into unsignedbounds > - harden SimpleCanonicalResult > - number lemmas > - include > - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 A few comments for the `if (zero_violation < one_violation) {` section. src/hotspot/share/opto/rangeinference.cpp line 257: > 255: // 0-vio = 0 0 0 0 0 0 0 0 > 256: // Since the result must have the 2nd bit set, it must be at least: > 257: // 1 1 0 0 0 0 0 0 It might be better to have the violation a little more close to the middle. Right now it is only a single bit off from the `highest_bit`. src/hotspot/share/opto/rangeinference.cpp line 263: > 261: > 262: // first_violation is the position of the violation counting from the > 263: // highest bit down (0-based), since i == 2, first_difference == 1 Suggestion: // highest bit down (0-based). For example: i == 2, first_violation == 1. If that is not what you wanted, then I'm not sure what `first_difference` refers to ;) src/hotspot/share/opto/rangeinference.cpp line 264: > 262: // first_violation is the position of the violation counting from the > 263: // highest bit down (0-based), since i == 2, first_difference == 1 > 264: juint first_violation = count_leading_zeros(one_violation); // 1 Suggestion: juint first_violation = count_leading_zeros(one_violation); What was the comment for? src/hotspot/share/opto/rangeinference.cpp line 268: > 266: constexpr U highest_bit = (std::numeric_limits::max() >> 1) + U(1); > 267: // 0 1 0 0 0 0 0 0 > 268: U alignment = highest_bit >> first_violation; Could it also be called `violated_bit_mask`? Not sure if that is more helpful... why did you name it `alignment`? src/hotspot/share/opto/rangeinference.cpp line 272: > 270: // that the result should not be smaller than this > 271: // 1 1 0 0 0 0 0 0 > 272: U new_lo = (lo & -alignment) + alignment; Ouff, this is one of these one-liners that need some explanation... I'll try to decode it. Hmm, it is also not really the `new_lo` which we return, there is another operation below. Maybe we can give this intermediate result a descriptive name? It seems that already `-alignment` does something interesting... but I'll leave it to you to explain. I'm less familiar with all the bit tricks, and continually amazed what is possible :) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2784525953 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054382718 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054370630 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054365480 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054387217 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054396626 From epeter at openjdk.org Tue Apr 22 15:52:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 15:52:02 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 15:30:35 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: >> >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - reviews >> - Merge branch 'master' into unsignedbounds >> - refine comments >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - harden SimpleCanonicalResult >> - number lemmas >> - include >> - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 > > src/hotspot/share/opto/rangeinference.cpp line 264: > >> 262: // first_violation is the position of the violation counting from the >> 263: // highest bit down (0-based), since i == 2, first_difference == 1 >> 264: juint first_violation = count_leading_zeros(one_violation); // 1 > > Suggestion: > > juint first_violation = count_leading_zeros(one_violation); > > What was the comment for? Ah, you want to say that `first_violation == 1`. Well you already say that above for the example... so not necessary I would say. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054372188 From qamai at openjdk.org Tue Apr 22 16:15:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 16:15:14 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Explain what alignment means ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/cdab1911..0fbbe5cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=46-47 Stats: 22 lines in 1 file changed: 7 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From epeter at openjdk.org Tue Apr 22 16:15:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 16:15:17 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 12:05:27 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: > > - Merge branch 'master' into unsignedbounds > - Merge branch 'master' into unsignedbounds > - reviews > - Merge branch 'master' into unsignedbounds > - refine comments > - Merge branch 'master' into unsignedbounds > - Merge branch 'master' into unsignedbounds > - harden SimpleCanonicalResult > - number lemmas > - include > - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 Another small batch for the else case. src/hotspot/share/opto/rangeinference.cpp line 277: > 275: assert(lo < new_lo, "this case cannot overflow"); > 276: return new_lo; > 277: } else { Suggestion: } else { assert(zero_violation > one_violation, "remaining case"); Could be a help to the reader, and a defense against bad changes. src/hotspot/share/opto/rangeinference.cpp line 280: > 278: // This means that the first bit that does not satisfy the bit requirement > 279: // is a 1 that should be a 0. Trace backward to find i which is the last > 280: // bit that is 0 in both lo and zeros. Does this `i` have a name below? What value does it have in the example? src/hotspot/share/opto/rangeinference.cpp line 293: > 291: // different bit between the result and lo must be the 3rd bit. As a result, > 292: // the result must not be smaller than: > 293: // 1 0 1 0 0 0 0 0 Oh, I'm starting to get an intuition here. I think we can make it a bit more "intuitive". We start at the "first violation". We would like to flip the bit from 1 to 0, but since we are only allowed to increase the number, we also need to flip the 5th bit. But that is already 1 too, so we need to go to the 4th. That one we cannot flip from 0 to 1, because it is forced to be a 0 by zeros. So the first bit we can flip is the 3rd. Maybe this formulation with "which is the highest bit we can flip" could be helpful for the intuition? src/hotspot/share/opto/rangeinference.cpp line 299: > 297: > 298: juint first_violation = count_leading_zeros(zero_violation); > 299: // This mask out all bits from the first violation Suggestion: // This mask out all bits after the first violation. src/hotspot/share/opto/rangeinference.cpp line 307: > 305: // violation, which is the last set bit of tmp > 306: // 0 1 1 0 0 0 0 0 > 307: U tmp = ~either & find_mask; Did I understand that right: `tmp` is the bits that we cannot flip? Or is it the ones we can flip? A better name would be appreciated :) Same for `either`. worst case you call it `lo_or_zeros`... but that's not great either. I'm not fully seeing through the logic here yet, so I struggle to make good suggestions. src/hotspot/share/opto/rangeinference.cpp line 309: > 307: U tmp = ~either & find_mask; > 308: // i == 2 here, shortcut the calculation instead of explicitly spelling out > 309: // i Suggestion: // i == 2 here, shortcut the calculation instead of explicitly spelling out i. The single `i` looks a little funny :) ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2784584965 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054401952 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054406394 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054418681 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054422027 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054429286 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054430521 From qamai at openjdk.org Tue Apr 22 16:15:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 16:15:18 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 15:33:32 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: >> >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - reviews >> - Merge branch 'master' into unsignedbounds >> - refine comments >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - harden SimpleCanonicalResult >> - number lemmas >> - include >> - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 > > src/hotspot/share/opto/rangeinference.cpp line 263: > >> 261: >> 262: // first_violation is the position of the violation counting from the >> 263: // highest bit down (0-based), since i == 2, first_difference == 1 > > Suggestion: > > // highest bit down (0-based). For example: i == 2, first_violation == 1. > > If that is not what you wanted, then I'm not sure what `first_difference` refers to ;) Yes you are right, I have also changed the bit location to be more at the centre. > src/hotspot/share/opto/rangeinference.cpp line 272: > >> 270: // that the result should not be smaller than this >> 271: // 1 1 0 0 0 0 0 0 >> 272: U new_lo = (lo & -alignment) + alignment; > > Ouff, this is one of these one-liners that need some explanation... I'll try to decode it. > > Hmm, it is also not really the `new_lo` which we return, there is another operation below. Maybe we can give this intermediate result a descriptive name? > > It seems that already `-alignment` does something interesting... but I'll leave it to you to explain. I'm less familiar with all the bit tricks, and continually amazed what is possible :) We want to obtain a value that is larger than `lo`, has the bit at a certain position set and all bits after that unset. It is aligning `lo` up to `alignment`. This is the standard operation for alignment when we know that `lo` is unaligned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054430959 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054433118 From epeter at openjdk.org Tue Apr 22 16:15:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 16:15:18 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 15:54:21 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: >> >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - reviews >> - Merge branch 'master' into unsignedbounds >> - refine comments >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - harden SimpleCanonicalResult >> - number lemmas >> - include >> - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 > > src/hotspot/share/opto/rangeinference.cpp line 280: > >> 278: // This means that the first bit that does not satisfy the bit requirement >> 279: // is a 1 that should be a 0. Trace backward to find i which is the last >> 280: // bit that is 0 in both lo and zeros. > > Does this `i` have a name below? What value does it have in the example? And why is this number relevant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054412237 From epeter at openjdk.org Tue Apr 22 16:15:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 16:15:18 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 15:57:55 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 280: >> >>> 278: // This means that the first bit that does not satisfy the bit requirement >>> 279: // is a 1 that should be a 0. Trace backward to find i which is the last >>> 280: // bit that is 0 in both lo and zeros. >> >> Does this `i` have a name below? What value does it have in the example? > > And why is this number relevant? Ah yes, I see you name it again. But maybe it could be good to give some intuition / motivation why this `i` is relevant. Maybe something along the line of "which is the bit we need to flip", see comments below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054432423 From kvn at openjdk.org Tue Apr 22 16:29:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 16:29:47 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). Good. I finally found why `Assembler()` did not throw error when code blob is not allocated and `_blob` is `NULL`: [assembler.cpp#L47](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/assembler.cpp#L47) In **debug** VM `CodeBuffer::set_blob()` replaces `NULLs` with `basAddress` value: [codeBuffer.cpp#L184](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.cpp#L184). It is done to "poison" pointers when `free_blob()` is called. So we could fix the issue by adding check for `badAddress` in `Assembler()`. But it will cause VM exit instead of disabling only C2 implemented by #23630. On other hand it will match behavior of **product** VM. I think I prefer current suggested fix to disable only C2. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24549#pullrequestreview-2784676228 PR Comment: https://git.openjdk.org/jdk/pull/24549#issuecomment-2821862353 From qamai at openjdk.org Tue Apr 22 16:41:20 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 16:41:20 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v49] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: More for Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/0fbbe5cd..fd7a7fa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=47-48 Stats: 23 lines in 1 file changed: 7 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From epeter at openjdk.org Tue Apr 22 16:41:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 16:41:21 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: References: Message-ID: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> On Tue, 22 Apr 2025 16:15:14 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Explain what alignment means src/hotspot/share/opto/rangeinference.cpp line 84: > 82: // Find the minimum value that is not less than lo and satisfies bits. If there > 83: // does not exist one such number, the calculation will overflow and return a > 84: // value < lo. I'm wondering if we should say anything more specific for this case. Maybe at least an example? It should probably not go here at the beginning, but somewhere further down. src/hotspot/share/opto/rangeinference.cpp line 362: > 360: // not larger than hi that satisfies {bits._zeros, bits._ones}, then ~new_hi > 361: // is the smallest value not smaller than ~hi that satisfies > 362: // {bits._ones, bits._zeros} This is a really nice high level argument. How hard do you think it would be to make it a little more detailed? It is especially the "strictly decreasing function" argument that might not be very easy for everyone to understand... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054446430 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054458487 From qamai at openjdk.org Tue Apr 22 16:41:21 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 16:41:21 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> References: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> Message-ID: On Tue, 22 Apr 2025 16:19:55 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Explain what alignment means > > src/hotspot/share/opto/rangeinference.cpp line 84: > >> 82: // Find the minimum value that is not less than lo and satisfies bits. If there >> 83: // does not exist one such number, the calculation will overflow and return a >> 84: // value < lo. > > I'm wondering if we should say anything more specific for this case. Maybe at least an example? It should probably not go here at the beginning, but somewhere further down. What do you mean? It is the doc for `adjust_lo`, of course it needs to be here, I think it is logical to say what the function does before saying how it does it. For that purpose I think it is very clear already and an example is not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054473778 From qamai at openjdk.org Tue Apr 22 16:41:21 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 16:41:21 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 16:11:00 GMT, Emanuel Peter wrote: >> And why is this number relevant? > > Ah yes, I see you name it again. But maybe it could be good to give some intuition / motivation why this `i` is relevant. > > Maybe something along the line of "which is the bit we need to flip", see comments below? Yeah I added some more to explain that we are referring to the `i` value in the formality section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054470249 From qamai at openjdk.org Tue Apr 22 16:41:23 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 16:41:23 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 16:01:59 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: >> >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - reviews >> - Merge branch 'master' into unsignedbounds >> - refine comments >> - Merge branch 'master' into unsignedbounds >> - Merge branch 'master' into unsignedbounds >> - harden SimpleCanonicalResult >> - number lemmas >> - include >> - ... and 51 more: https://git.openjdk.org/jdk/compare/0995b940...cdab1911 > > src/hotspot/share/opto/rangeinference.cpp line 293: > >> 291: // different bit between the result and lo must be the 3rd bit. As a result, >> 292: // the result must not be smaller than: >> 293: // 1 0 1 0 0 0 0 0 > > Oh, I'm starting to get an intuition here. I think we can make it a bit more "intuitive". > > We start at the "first violation". We would like to flip the bit from 1 to 0, but since we are only allowed to increase the number, we also need to flip the 5th bit. But that is already 1 too, so we need to go to the 4th. That one we cannot flip from 0 to 1, because it is forced to be a 0 by zeros. So the first bit we can flip is the 3rd. > > Maybe this formulation with "which is the highest bit we can flip" could be helpful for the intuition? That's a better guidance for sure, I have modified this part. > src/hotspot/share/opto/rangeinference.cpp line 307: > >> 305: // violation, which is the last set bit of tmp >> 306: // 0 1 1 0 0 0 0 0 >> 307: U tmp = ~either & find_mask; > > Did I understand that right: `tmp` is the bits that we cannot flip? Or is it the ones we can flip? A better name would be appreciated :) > Same for `either`. worst case you call it `lo_or_zeros`... but that's not great either. > > I'm not fully seeing through the logic here yet, so I struggle to make good suggestions. We can say that `tmp` is all the bits we can flip, although I think that is too ambiguous, what does "can" mean here. It is better to think of it as all the bits that are not lower than `first_violation` and are 0 in both `lo` and `zeros`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054469385 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054468470 From bulasevich at openjdk.org Tue Apr 22 16:50:04 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 22 Apr 2025 16:50:04 GMT Subject: RFR: 8332368: ubsan aarch64: immediate_aarch64.cpp:298:31: runtime error: shift exponent 32 is too large for 32-bit type 'int' In-Reply-To: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> References: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> Message-ID: <8brwgNfTjZcAOeKknpJOuXyaNwfyi2mGiVO6okcBFVU=.58cf43b1-e464-40f7-8019-08b7b6f17d7c@github.com> On Wed, 16 Apr 2025 18:32:16 GMT, Boris Ulasevich wrote: > Running a linux-aarch64-server-fastdebug build with UBSAN (--enable-ubsan configure option) gives the following runtime error immediately on JVM start: > > > ad_aarch64.hpp:7114:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' > # Pipeline_Use_Cycle_Mask::operator<<=(int) make/hotspot/ad_aarch64.hpp:7114 > # Pipeline_Use_Element::step(unsigned int) make/hotspot/ad_aarch64.hpp:7168 > # Pipeline_Use::step(unsigned int) make/hotspot/ad_aarch64.hpp:7216 > # Scheduling::step(unsigned int) src/hotspot/share/opto/output.cpp:2116 > # Scheduling::AddNodeToBundle(Node*, Block const*) src/hotspot/share/opto/output.cpp:2553 > > > The value of 100 comes from fixed_latency, defined in AD files: > > // Pipeline class for call. > pipe_class pipe_class_call() > %{ > single_instruction; > fixed_latency(100); > %} > > > The fixed_latency value is used by the scheduler to model the occupancy of functional units over time. The occupancy is tracked using a uint mask value: > > fprintf(fp_hpp, "class Pipeline_Use_Element {\n"); > fprintf(fp_hpp, "protected:\n"); > fprintf(fp_hpp, " // Mask of used functional units\n"); > fprintf(fp_hpp, " uint _used;\n\n"); > > When the scheduler virtually steps over the instruction, it shifts the masks left by the instruction's latency. The problem is that 100 is greater than sizeof(uint), and left-shifting by 100 effectively zeroes the _mask, but, according to the C++ standard, this is undefined behavior. > > We can find a number of fixed_latency(100) expressions in aarch64.ad, arm.ad, ppc.ad, riscv.ad, x86_64.ad files. Perhaps all of them deserve correction. I suggest leaving the AD files as they are, but limiting the shift value in case it exceeds the allowed maximum in a generated code: > > void step(uint cycles) { > _used = 0; > - _mask <<= cycles; > + uint max_shift = 8 * sizeof(_mask) - 1; > + _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > > In fact, this change does not affect the current behavior; we just eliminate the undefined behavior while preserving the intended semantics. Thank you, Andrew ------------- PR Comment: https://git.openjdk.org/jdk/pull/24696#issuecomment-2821908324 From bulasevich at openjdk.org Tue Apr 22 16:50:05 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 22 Apr 2025 16:50:05 GMT Subject: Integrated: 8332368: ubsan aarch64: immediate_aarch64.cpp:298:31: runtime error: shift exponent 32 is too large for 32-bit type 'int' In-Reply-To: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> References: <9mxXDDhuFS_jZmqSkYtOtsYfmpP9r-616HsIFGBDTUE=.e0b5b3cd-0fc3-434e-95aa-840344648beb@github.com> Message-ID: On Wed, 16 Apr 2025 18:32:16 GMT, Boris Ulasevich wrote: > Running a linux-aarch64-server-fastdebug build with UBSAN (--enable-ubsan configure option) gives the following runtime error immediately on JVM start: > > > ad_aarch64.hpp:7114:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' > # Pipeline_Use_Cycle_Mask::operator<<=(int) make/hotspot/ad_aarch64.hpp:7114 > # Pipeline_Use_Element::step(unsigned int) make/hotspot/ad_aarch64.hpp:7168 > # Pipeline_Use::step(unsigned int) make/hotspot/ad_aarch64.hpp:7216 > # Scheduling::step(unsigned int) src/hotspot/share/opto/output.cpp:2116 > # Scheduling::AddNodeToBundle(Node*, Block const*) src/hotspot/share/opto/output.cpp:2553 > > > The value of 100 comes from fixed_latency, defined in AD files: > > // Pipeline class for call. > pipe_class pipe_class_call() > %{ > single_instruction; > fixed_latency(100); > %} > > > The fixed_latency value is used by the scheduler to model the occupancy of functional units over time. The occupancy is tracked using a uint mask value: > > fprintf(fp_hpp, "class Pipeline_Use_Element {\n"); > fprintf(fp_hpp, "protected:\n"); > fprintf(fp_hpp, " // Mask of used functional units\n"); > fprintf(fp_hpp, " uint _used;\n\n"); > > When the scheduler virtually steps over the instruction, it shifts the masks left by the instruction's latency. The problem is that 100 is greater than sizeof(uint), and left-shifting by 100 effectively zeroes the _mask, but, according to the C++ standard, this is undefined behavior. > > We can find a number of fixed_latency(100) expressions in aarch64.ad, arm.ad, ppc.ad, riscv.ad, x86_64.ad files. Perhaps all of them deserve correction. I suggest leaving the AD files as they are, but limiting the shift value in case it exceeds the allowed maximum in a generated code: > > void step(uint cycles) { > _used = 0; > - _mask <<= cycles; > + uint max_shift = 8 * sizeof(_mask) - 1; > + _mask <<= (cycles < max_shift) ? cycles : max_shift; > } > > In fact, this change does not affect the current behavior; we just eliminate the undefined behavior while preserving the intended semantics. This pull request has now been integrated. Changeset: d783a940 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/d783a940988677dc91975f884adeaf9f047f7e07 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8332368: ubsan aarch64: immediate_aarch64.cpp:298:31: runtime error: shift exponent 32 is too large for 32-bit type 'int' Reviewed-by: adinn ------------- PR: https://git.openjdk.org/jdk/pull/24696 From dhanalla at openjdk.org Tue Apr 22 16:59:33 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 22 Apr 2025 16:59:33 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v11] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: uncomment the fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/a8cb47d6..8581a24f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From qamai at openjdk.org Tue Apr 22 17:01:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 17:01:48 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v50] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: more rigour for ~hi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/fd7a7fa1..eb3d69a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=48-49 Stats: 28 lines in 1 file changed: 24 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Tue Apr 22 17:01:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 17:01:53 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> References: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> Message-ID: On Tue, 22 Apr 2025 16:27:46 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Explain what alignment means > > src/hotspot/share/opto/rangeinference.cpp line 362: > >> 360: // not larger than hi that satisfies {bits._zeros, bits._ones}, then ~new_hi >> 361: // is the smallest value not smaller than ~hi that satisfies >> 362: // {bits._ones, bits._zeros} > > This is a really nice high level argument. How hard do you think it would be to make it a little more detailed? > It is especially the "strictly decreasing function" argument that might not be very easy for everyone to understand... Done, I have added a small (albeit IMO trivial) proof to this claim. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2054501894 From iklam at openjdk.org Tue Apr 22 17:03:45 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 22 Apr 2025 17:03:45 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v4] In-Reply-To: References: Message-ID: <6yy0-CK4wUWgLd23lWP6Z2crB4lMICXohP-NEw0YPok=.8b184803-d9d0-490d-9a27-0d9098f36888@github.com> On Fri, 18 Apr 2025 18:45:24 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity test for AOTAdapterCaching flag src/hotspot/share/cds/cdsConfig.cpp line 869: > 867: > 868: bool CDSConfig::is_dumping_aot_code_enabled() { > 869: return _is_dumping_aot_code_enabled; Other functions in CDSConfig don't have the `_enabled()` suffix. E.g., `is_dumping_method_handles()`. It doesn't mean we are doing method handle right now, but rather we have the ability to do so. Since we rarely ask "am I in the middle of dumping X right now", I think adding `_enabled()` will be redundant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2054507687 From epeter at openjdk.org Tue Apr 22 17:09:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 17:09:02 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor and improve TestNestedSynchronize.java src/hotspot/share/opto/chaitin.cpp line 1614: > 1612: uint size = lrg->mask().Size(); > 1613: ResourceMark r(C->regmask_arena()); > 1614: RegMask rm(lrg->mask(), C->regmask_arena()); Absolute nit: above you already have a `rm`. There it refers to a `ResourceMark`, here to a `RegMask`. Feels a little confusing. A more expressive name could help ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2054514398 From qamai at openjdk.org Tue Apr 22 17:10:35 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 17:10:35 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v9] In-Reply-To: References: <7YA6L2iyqBhZPOUZ8Ps_DsawE9VofyXnEqmJfoxJ0Ng=.b3acc12e-c0fc-40c3-a0fe-f88a2cd382af@github.com> Message-ID: On Thu, 10 Apr 2025 00:37:56 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> assert CastLL > > FTR here's AArch64 support: > https://github.com/openjdk/jdk/commit/7ed34d09560d9db79183a80df379f3003f79bb1b > > Feel free to incorporate it in this PR or I'll upstream it separately.. @iwanowww Thanks a lot for your help, I have incorporated your patches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2821962515 From qamai at openjdk.org Tue Apr 22 17:10:34 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Apr 2025 17:10:34 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Reconstruct FP - aarch64 support - Merge branch 'master' into verifycast - assert CastLL - reviews - make the flag diagnostic - Merge branch 'master' into verifycast - draft - Merge branch 'master' into verifycast - Merge branch 'master' into verifycast - ... and 6 more: https://git.openjdk.org/jdk/compare/59344b6e...8d140fd9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22880/files - new: https://git.openjdk.org/jdk/pull/22880/files/45b45495..8d140fd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=08-09 Stats: 231655 lines in 1175 files changed: 40089 ins; 186908 del; 4658 mod Patch: https://git.openjdk.org/jdk/pull/22880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880 PR: https://git.openjdk.org/jdk/pull/22880 From kvn at openjdk.org Tue Apr 22 17:28:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 17:28:12 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v4] In-Reply-To: <6yy0-CK4wUWgLd23lWP6Z2crB4lMICXohP-NEw0YPok=.8b184803-d9d0-490d-9a27-0d9098f36888@github.com> References: <6yy0-CK4wUWgLd23lWP6Z2crB4lMICXohP-NEw0YPok=.8b184803-d9d0-490d-9a27-0d9098f36888@github.com> Message-ID: On Tue, 22 Apr 2025 17:01:20 GMT, Ioi Lam wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Add sanity test for AOTAdapterCaching flag > > src/hotspot/share/cds/cdsConfig.cpp line 869: > >> 867: >> 868: bool CDSConfig::is_dumping_aot_code_enabled() { >> 869: return _is_dumping_aot_code_enabled; > > Other functions in CDSConfig don't have the `_enabled()` suffix. E.g., `is_dumping_method_handles()`. It doesn't mean we are doing method handle right now, but rather we have the ability to do so. > > Since we rarely ask "am I in the middle of dumping X right now", I think adding `_enabled()` will be redundant. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2054541461 From kvn at openjdk.org Tue Apr 22 17:33:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 17:33:50 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v5] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: remove _enabled suffix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/2863a964..ba08626b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=03-04 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From epeter at openjdk.org Tue Apr 22 17:37:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Apr 2025 17:37:52 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor and improve TestNestedSynchronize.java Wow, I'm very unfamiliar with this code. Thanks for working on this, it must have taken a while to dig into this, thanks for the work :) I have a few little comments below, need more time to dig deeper down. src/hotspot/share/opto/chaitin.cpp line 1648: > 1646: // If we fail to color and the AllStack flag is set, trigger > 1647: // a chunk-rollover event > 1648: if (!OptoReg::is_valid(reg) && is_allstack) { Control question: here we seem to have passed `-chunk`. And we used to check `chunk != 0` in lots of places. What was the significance to negative chunk? Did you think about that? src/hotspot/share/opto/compile.hpp line 524: > 522: PhaseRegAlloc* _regalloc; // Results of register allocation. > 523: RegMask _FIRST_STACK_mask; // All stack slots usable for spills (depends on frame layout) > 524: ResourceArea _regmask_arena; // Holds dynamically allocated extensions of short-lived register masks If they are short-lived ... then why not just resource allocate them? Is there a conflict? Would it be good to describe that somewhere? src/hotspot/share/opto/regmask.hpp line 147: > 145: // If the original version, of which we may be a clone, is read-only. In such > 146: // cases, we can allow read-only sharing. > 147: bool orig_const = false; Could we have a more descriptive name? I was wondering at a use site what this is about... and it was not directly clear. Oh, and should it not be `_orig_const`, with the extra underscore at the very least? Maybe `_read_only_sharing`? Not sure about it, just an idea. But the underscore I'm more sure about. src/hotspot/share/opto/regmask.hpp line 420: > 418: : RegMask(arena) { > 419: Insert(reg); > 420: DEBUG_ONLY(this->orig_const = orig_const;) Could this not be part of the initializer list? Or does the `Insert` prevent that? ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2784774928 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2054519051 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2054532956 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2054552632 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2054553843 From jrose at openjdk.org Tue Apr 22 18:26:57 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 22 Apr 2025 18:26:57 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:12:19 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Refine validation and defensive copying src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 58: > 56: * The HotSpot VM checks, when loading a class, the consistency of recognized > 57: * methods and {@code @IntrinsicCandidate} annotations, unless the {@code > 58: * CheckIntrinsics} VM flag is disabled.

Even if an intrinsic is available the VM is not obligated to use it. For example, the bytecodes of an intrinsic method may be executed by lower tiers of VM execution, while higher tiers may replace the bytecodes with specialized assembly code and/or compiler IR. Therefore, intrinsic implementors must ensure that non-bytecode execution has (in an application-specific sense) the same results as execution of the actual Java code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2054639422 From jrose at openjdk.org Tue Apr 22 18:41:41 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 22 Apr 2025 18:41:41 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 18:23:52 GMT, John R Rose wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine validation and defensive copying > > src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 58: > >> 56: * The HotSpot VM checks, when loading a class, the consistency of recognized >> 57: * methods and {@code @IntrinsicCandidate} annotations, unless the {@code >> 58: * CheckIntrinsics} VM flag is disabled. > >

> Even if an intrinsic is available the VM is not obligated to use it. For example, > the bytecodes of an intrinsic method may be executed by lower tiers of VM > execution, while higher tiers may replace the bytecodes with specialized > assembly code and/or compiler IR. Therefore, intrinsic implementors must > ensure that non-bytecode execution has (in an application-specific sense) > the same results as execution of the actual Java code. Also, persons not > directly involved with maintaining the Java libraries or the HotSpot VM can > usually ignore the {@code @IntrinsicCandidate} annotation, when reasoning > about the actions of the method. Such persons must not assume, however, > that checks for nulls, out-of-bounds indexes, and wrong types will be performed > by the intrinsic, even though normal Java execution always performs such checks. And for arrays, a footnote might be appropriate:

For some highly optimized algorithms, it may be impractical to ensure that array data is read or written only once by the intrinsic. If the caller of the intrinsic cannot guarantee that such array data is unshared, then the caller must also document the effects of race conditions on it. (Such a race occurs when another thread writes the array data during the execution of the intrinsic.) For example, the documentation can simply say that the result is undefined if a race happens. And maybe, after that, something more about type safety:

In no case may any intrinsic be allowed to perform an operation that fails to be type safe. It must not indirect a null pointer; it must not access a field or method on an object which does not possess that field or method; it must not access an element of an array not actually present in the array; and it must not manipulate managed references in a way that prevents the GC from managing them. The caller of the intrinsic is fully responsible for preventing every kind of type safety violation, in the case (the common case) that the intrinsic itself does not itself somehow prevent the violation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2054665206 From jrose at openjdk.org Tue Apr 22 18:44:40 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 22 Apr 2025 18:44:40 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 18:38:42 GMT, John R Rose wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 58: >> >>> 56: * The HotSpot VM checks, when loading a class, the consistency of recognized >>> 57: * methods and {@code @IntrinsicCandidate} annotations, unless the {@code >>> 58: * CheckIntrinsics} VM flag is disabled. >> >>

>> Even if an intrinsic is available the VM is not obligated to use it. For example, >> the bytecodes of an intrinsic method may be executed by lower tiers of VM >> execution, while higher tiers may replace the bytecodes with specialized >> assembly code and/or compiler IR. Therefore, intrinsic implementors must >> ensure that non-bytecode execution has (in an application-specific sense) >> the same results as execution of the actual Java code. Also, persons not >> directly involved with maintaining the Java libraries or the HotSpot VM can >> usually ignore the {@code @IntrinsicCandidate} annotation, when reasoning >> about the actions of the method. Such persons must not assume, however, >> that checks for nulls, out-of-bounds indexes, and wrong types will be performed >> by the intrinsic, even though normal Java execution always performs such checks. > > And for arrays, a footnote might be appropriate: > > >

> For some highly optimized algorithms, it may be impractical to ensure that array > data is read or written only once by the intrinsic. If the caller of the intrinsic > cannot guarantee that such array data is unshared, then the caller must also > document the effects of race conditions on it. (Such a race occurs when another > thread writes the array data during the execution of the intrinsic.) For example, > the documentation can simply say that the result is undefined if a race happens. > > > And maybe, after that, something more about type safety: > > >

> In no case may any intrinsic be allowed to perform an operation that fails to be > type safe. It must not indirect a null pointer; it must not access a field or method > on an object which does not possess that field or method; it must not access > an element of an array not actually present in the array; and it must not > manipulate managed references in a way that prevents the GC from managing > them. The caller of the intrinsic is fully responsible for preventing every kind > of type safety violation, in the case (the common case) that the intrinsic itself > does not itself somehow prevent the violation. In the new design, the above "footnotes" go at the bottom. They explain why the rules prescribed at the top are important. In effect, inform aggressive implementors how far they may bend those rules. Sometimes the rules do get bent, sometimes to allow unspecified behavior, but never so far as to allow a type safety violation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2054672049 From ecaspole at openjdk.org Tue Apr 22 19:29:20 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Tue, 22 Apr 2025 19:29:20 GMT Subject: RFR: 8355233: Add a DMB related benchmark [v3] In-Reply-To: References: Message-ID: > In addition to the details in the JBS, I changed the name from DoubleDMB to DMBCheck. Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8355233' of github.com:ericcaspole/jdk into JDK-8355233 - Add comments explaining the purpose of this JMH ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24783/files - new: https://git.openjdk.org/jdk/pull/24783/files/312cb317..c0fdf758 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24783&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24783&range=01-02 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24783/head:pull/24783 PR: https://git.openjdk.org/jdk/pull/24783 From kvn at openjdk.org Tue Apr 22 19:34:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 19:34:43 GMT Subject: RFR: 8355233: Add a DMB related benchmark [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 19:29:20 GMT, Eric Caspole wrote: >> In addition to the details in the JBS, I changed the name from DoubleDMB to DMBCheck. > > Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8355233' of github.com:ericcaspole/jdk into JDK-8355233 > - Add comments explaining the purpose of this JMH Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24783#pullrequestreview-2785134920 From mli at openjdk.org Tue Apr 22 19:37:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Apr 2025 19:37:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 16:34:21 GMT, Hamlin Li wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into vector.math.01.java >> - RVV and SVE adjustments >> - fix broken merge >> - Merge branch 'master' into vector.math.01.java >> - Fix debugName handling >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - ... and 14 more: https://git.openjdk.org/jdk/compare/88f7b422...88eacc48 > > src/hotspot/share/runtime/abstract_vm_version.cpp line 349: > >> 347: assert(features_offset <= cpu_info_string_len, ""); >> 348: if (features_offset < cpu_info_string_len) { >> 349: assert(cpu_info_string[features_offset + 0] == ',', ""); > > This assert fails on riscv. A simple fix could be: diff --git a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp index 484a2a645aa..a785dc65c9e 100644 --- a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp +++ b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp @@ -196,25 +196,12 @@ void VM_Version::setup_cpu_available_features() { _cpu_info_string = os::strdup(buf); - _features_string = extract_features_string(_cpu_info_string, - strnlen(_cpu_info_string, sizeof(buf)), - features_offset); + _features_string = _cpu_info_string; } > src/hotspot/share/runtime/abstract_vm_version.hpp line 61: > >> 59: static const char* _features_string; >> 60: >> 61: static const char* _cpu_info_string; > > Not quite sure the reason to introduce `_cpu_info_string`. > Seems to me you could just use _features_string, and remove _cpu_info_string and its related code, e.g. `extract_features_string`. Please check the code in `test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java` Mayber in `CPUFeatures`, could use the similar code as `CPUInfo` to split the cpu string into cpu features? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054485082 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054697247 From mli at openjdk.org Tue Apr 22 19:37:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Apr 2025 19:37:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:03:47 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into vector.math.01.java > - RVV and SVE adjustments > - fix broken merge > - Merge branch 'master' into vector.math.01.java > - Fix debugName handling > - Merge branch 'master' into vector.math.01.java > - RVV and SVE adjustments > - Merge branch 'master' into vector.math.01.java > - Fix windows-aarch64 build failure > - features_string -> cpu_info_string > - ... and 14 more: https://git.openjdk.org/jdk/compare/88f7b422...88eacc48 I just ran some basic tests on riscv, seems there are some issues, also have some comments. src/hotspot/share/runtime/abstract_vm_version.cpp line 349: > 347: assert(features_offset <= cpu_info_string_len, ""); > 348: if (features_offset < cpu_info_string_len) { > 349: assert(cpu_info_string[features_offset + 0] == ',', ""); This assert fails on riscv. src/hotspot/share/runtime/abstract_vm_version.hpp line 61: > 59: static const char* _features_string; > 60: > 61: static const char* _cpu_info_string; Not quite sure the reason to introduce `_cpu_info_string`. Seems to me you could just use _features_string, and remove _cpu_info_string and its related code, e.g. `extract_features_string`. Please check the code in `test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java` src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 45: > 43: String featuresString = VectorSupport.getCPUFeatures(); > 44: debug(featuresString); > 45: String[] features = featuresString.toLowerCase(Locale.ROOT).split(", "); // ", " is used as a delimiter On riscv, it's splitted by " ", for the fix please refer to CPUInfo.java in test. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 82: > 80: return features; > 81: } > 82: } Maybe an extra line needed at the end of this file? ------------- PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2784177749 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054468091 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054692791 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054472788 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2054157269 From sviswanathan at openjdk.org Tue Apr 22 19:45:41 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 22 Apr 2025 19:45:41 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: <-pd5SaWoYezSlW6bjQ6s-9URqhtrBioPrBkh9hxDuq8=.8096bdf6-af58-465b-86cb-558ec99415c5@github.com> On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2785160419 From ecaspole at openjdk.org Tue Apr 22 20:02:49 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Tue, 22 Apr 2025 20:02:49 GMT Subject: Integrated: 8355233: Add a DMB related benchmark In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 22:09:17 GMT, Eric Caspole wrote: > In addition to the details in the JBS, I changed the name from DoubleDMB to DMBCheck. This pull request has now been integrated. Changeset: 239760ac Author: Eric Caspole URL: https://git.openjdk.org/jdk/commit/239760ac09c78a9c989df54f6526b67448540eda Stats: 107 lines in 1 file changed: 107 ins; 0 del; 0 mod 8355233: Add a DMB related benchmark Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/24783 From duke at openjdk.org Tue Apr 22 20:31:24 2025 From: duke at openjdk.org (Mohamed Issa) Date: Tue, 22 Apr 2025 20:31:24 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v6] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios. > > | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | > | :-------------------: | :----------------: | :----------------: | :------------------------: | > | [-1, 1] | 26.043 | 25.929 | -0.44 | > | [-2, 2] | 25.330 | 25.260 | -0.28 | > | [-10, 10] | 24.930 | 24.936 | +0.02 | > | [-20, 20] | 24.908 | 24.844 | -0.26 | > | [-100, 100] | 53.813 | 76.650 | +42.44 | > | [-1000, 1000] | 84.459 | 115.106 | +36.29 | > | [-10000, 10000] | 93.980 | 123.320 | +31.22 | > | [-100000, 1000... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Restructure tanh micro-benchmarks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/ced66426..cd6e1bab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=04-05 Stats: 33 lines in 1 file changed: 14 ins; 2 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From liach at openjdk.org Tue Apr 22 20:58:21 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 22 Apr 2025 20:58:21 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v3] In-Reply-To: References: Message-ID: <_1ljTpPgVmfrKlQWZhnHViS10crap9saTmU-H3eBux4=.9d36c6c2-5015-45d9-9ffd-0560a4a7ed5b@github.com> > In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Updates, thanks to John ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24777/files - new: https://git.openjdk.org/jdk/pull/24777/files/e8adab3c..e2e1da5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=01-02 Stats: 60 lines in 1 file changed: 40 ins; 10 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777 PR: https://git.openjdk.org/jdk/pull/24777 From liach at openjdk.org Tue Apr 22 21:00:48 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 22 Apr 2025 21:00:48 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:12:19 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Refine validation and defensive copying I have incorporated these "footnote" as "blockquote" notes in the middle of the docs, immediately after their relevant paragraphs. Intellij renders them with extra indentation, which IMO has nice appearance. Now, the docs have a few paragraphs: 1. Intro to intrinsics and intrinsification 2. Intrinsification can happen at any time, need consistency 3. Candidate methods and intrinsics - the arg assumptions are exported from intrinsics to candidate methods, callers fully responsible (footnote: list of type safety breaches) 4. Encapsulation of candidate with callers, and arg checks performed by callers (and array arg check perks) 5. VM also does consistency check for this annotation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24777#issuecomment-2822469588 From liach at openjdk.org Tue Apr 22 21:03:48 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 22 Apr 2025 21:03:48 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 18:42:01 GMT, John R Rose wrote: >> And for arrays, a footnote might be appropriate: >> >> >>

>> For some highly optimized algorithms, it may be impractical to ensure that array >> data is read or written only once by the intrinsic. If the caller of the intrinsic >> cannot guarantee that such array data is unshared, then the caller must also >> document the effects of race conditions on it. (Such a race occurs when another >> thread writes the array data during the execution of the intrinsic.) For example, >> the documentation can simply say that the result is undefined if a race happens. >> >> >> And maybe, after that, something more about type safety: >> >> >>

>> In no case may any intrinsic be allowed to perform an operation that fails to be >> type safe. It must not indirect a null pointer; it must not access a field or method >> on an object which does not possess that field or method; it must not access >> an element of an array not actually present in the array; and it must not >> manipulate managed references in a way that prevents the GC from managing >> them. The caller of the intrinsic is fully responsible for preventing every kind >> of type safety violation, in the case (the common case) that the intrinsic itself >> does not itself somehow prevent the violation. > > In the new design, the above "footnotes" go at the bottom. They explain why the rules prescribed at the top are important. In effect, inform aggressive implementors how far they may bend those rules. Sometimes the rules do get bent, sometimes to allow unspecified behavior, but never so far as to allow a type safety violation. I still believe these information are important and better kept inline; inlined them with blockquote tags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2054866335 From jrose at openjdk.org Tue Apr 22 21:20:40 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 22 Apr 2025 21:20:40 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v2] In-Reply-To: References: Message-ID: <_HuuRIBzg-gElqzCelP9RwQqjMMDNUb-WJHHyq-NWFg=.ac3d8690-f860-468e-8d3e-a0d5ad81bae9@github.com> On Tue, 22 Apr 2025 21:01:13 GMT, Chen Liang wrote: >> In the new design, the above "footnotes" go at the bottom. They explain why the rules prescribed at the top are important. In effect, inform aggressive implementors how far they may bend those rules. Sometimes the rules do get bent, sometimes to allow unspecified behavior, but never so far as to allow a type safety violation. > > I still believe these information are important and better kept inline; inlined them with blockquote tags. +1 from me on your preference. In formulating footnotes I was trying to respect your attempt to put the most important bits at the top. I actually prefer the "sidebars" (what you have) to "footnotes". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2054895659 From jrose at openjdk.org Tue Apr 22 21:28:46 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 22 Apr 2025 21:28:46 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v3] In-Reply-To: <_1ljTpPgVmfrKlQWZhnHViS10crap9saTmU-H3eBux4=.9d36c6c2-5015-45d9-9ffd-0560a4a7ed5b@github.com> References: <_1ljTpPgVmfrKlQWZhnHViS10crap9saTmU-H3eBux4=.9d36c6c2-5015-45d9-9ffd-0560a4a7ed5b@github.com> Message-ID: <2EPrvfeY1s839YjeMEjCiBCRk0xU5GgEyxmSZtmM47c=.7bedef1a-e1fc-4a98-9100-aed7c7472e10@github.com> On Tue, 22 Apr 2025 20:58:21 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Updates, thanks to John src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 34: > 32: * recognized by {@code vmIntrinsics.hpp} and may be subject to intrinsification > 33: * by the HotSpot VM (see {@code LibraryCallKit::try_to_inline} in {@code > 34: * library_call.cpp}) if an intrinsic is available. Intrinsification replaces a This lead sentence is too long to read conveniently, partly because it is broken up by a parenthetical. Suggest: The `@IntrinsicCandidate` indicates that an annotated method is recognized by `vmIntrinsics.hpp` and may be subject to intrinsification by the HotSpot VM. (See `LibraryCallKit::try_to_inline` in `library_call.cpp` for logic that checks if an intrinsic is available and applicable at a given call site.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2054909244 From jrose at openjdk.org Tue Apr 22 21:31:45 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 22 Apr 2025 21:31:45 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v3] In-Reply-To: <_1ljTpPgVmfrKlQWZhnHViS10crap9saTmU-H3eBux4=.9d36c6c2-5015-45d9-9ffd-0560a4a7ed5b@github.com> References: <_1ljTpPgVmfrKlQWZhnHViS10crap9saTmU-H3eBux4=.9d36c6c2-5015-45d9-9ffd-0560a4a7ed5b@github.com> Message-ID: On Tue, 22 Apr 2025 20:58:21 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Updates, thanks to John FWIW I posted a formatted version here: https://gist.github.com/rose00/b6f7b2d18bae20b9cd277672c05075b5 It made it easier for me to review, although it is likely to go out of date quickly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24777#issuecomment-2822524512 From dhanalla at openjdk.org Tue Apr 22 21:53:29 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 22 Apr 2025 21:53:29 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v12] In-Reply-To: References: Message-ID: > **Problem:** > In the debug build, the assertion assert(C->live_nodes() <= C->max_node_limit()) is triggered during the parsing phase when the compiler creates more than 80K live nodes while scalarizing large arrays. In the release build, however, compilation proceeds until code generation and then bails out at Compile::check_node_count(), completing execution without crashing. > > This discrepancy occurs because two Phi nodes are added per array element during scalar replacement, leading to a rapid increase in node count?especially when the EliminateAllocationArraySizeLimit JVM option is set high. When the assert is commented out, both builds behave similarly, bailing out during code generation after fully building the ideal graph. > > **Proposed Solution:** > Introduce a bailout check during graph building in the scalar replacement phase. If the number of live nodes exceeds a defined threshold, the compiler will bail out and trigger a recompilation without Escape Analysis (EA). This prevents the construction of an excessively large graph and ensures consistency between debug and release builds. > > This approach is preferable because the graph may already be partially modified with scalarization-related nodes, and a clean recompilation path helps maintain compiler stability and performance. The bailout logic has been aligned with the existing mechanism in escape.cpp and refactored to reuse the same functionality where appropriate. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: add additional run with fewer flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/8581a24f..ac393901 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=10-11 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From liach at openjdk.org Tue Apr 22 21:57:19 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 22 Apr 2025 21:57:19 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v4] In-Reply-To: References: Message-ID: > In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Shorter first sentence ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24777/files - new: https://git.openjdk.org/jdk/pull/24777/files/e2e1da5e..d7b652e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=02-03 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777 PR: https://git.openjdk.org/jdk/pull/24777 From duke at openjdk.org Tue Apr 22 22:12:45 2025 From: duke at openjdk.org (Mohamed Issa) Date: Tue, 22 Apr 2025 22:12:45 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v5] In-Reply-To: References: <1bOfSMpeLQHGXAGf_vLHc3GVFz2TlEiPPbXzkeOU0N0=.f82dcfb1-055f-40d5-b043-1dab17a08d0a@github.com> Message-ID: <1-jr39EAYx2wI4UAssH7qk-q__YGcf0MIsftiqrHKrU=.accec87d-20a0-4a38-a7ce-728b07b61a40@github.com> On Thu, 17 Apr 2025 13:21:59 GMT, Jatin Bhateja wrote: >> Mohamed Issa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Add new tanh micro-benchmark that covers different ranges of input values > > test/micro/org/openjdk/bench/java/lang/MathBench.java line 70: > >> 68: >> 69: @Param("0") >> 70: public double tanhBound1; > > Suggestion: > > @Param("0", "1", "2", "3") > public double tanhRangeIndex; The latest code update has these changes with the addition of extra curly braces to get everything compiling properly. > test/micro/org/openjdk/bench/java/lang/MathBench.java line 73: > >> 71: >> 72: @Param("2.7755575615628914E-17") >> 73: public double tanhBound2; > > We can declare tanBoundIndex as a Parameter and then select from [hard-coded value ranges](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/FdLibm.java#L3258), which will allow us to execute all the special ranges and NaN value. > double tanhRangeArray [][] = { 0.0 , 0x1.0P-56}, {0x1.0P-56, 1.0}, {1.0, 22.0}, {22.0, Double.POSITIVE_INFINITY}} > double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][0]; > double tanhRangeLowerBound = tanhRangeArray[tanhRangeIndex][1]; This is included in the latest code update. I used the largest double value just before +INF because the random number generator doesn't accept -INF or +INF as bounds. > test/micro/org/openjdk/bench/java/lang/MathBench.java line 549: > >> 547: for (int i = 0; i < tanhValueCount; i++) { >> 548: sum += Math.tanh(tanhPosVector[i]) + Math.tanh(tanhNegVector[i]); >> 549: } > > You can remove the noise from the benchmark by assiging the array element to a double field in Invocation level setup and then directly pass that as an argument to tanh. > Refer https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/ArraysSort.java#L109 The latest code update fixes this. > test/micro/org/openjdk/bench/java/lang/MathBench.java line 551: > >> 549: } >> 550: return sum; >> 551: } > > Please also add benchmark kernels receiving constant inputs i.e. Math.tanh(1.0). > > Current handling for transidental intrinsics creates a stub call node during parsing, which leaves no room to perform constant folding Value transforms. Creating a macro IR which runs through GVN optimization and lazily expands to CallNode should fix it. We already have a similar JBS https://bugs.openjdk.org/browse/JDK-8350831 but its good to add a benchmark for now. I added new constant inputs to the non-range _tanh_ benchmark. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2054956595 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2054958491 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2054954019 PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2054954661 From kvn at openjdk.org Tue Apr 22 22:42:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 22:42:08 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v6] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Generate far jumps for AOT code on AArch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/ba08626b..32b997d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=04-05 Stats: 13 lines in 2 files changed: 11 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From kvn at openjdk.org Tue Apr 22 22:45:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 22:45:41 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v6] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 22:42:08 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Generate far jumps for AOT code on AArch64 Added missing changes for Aarch64 to always generate far jumps in AOT code. `generate_i2c2i_adapters()` use `far_jump()`. Note, for x64 `ForceUnreachable` flag is set to `true` in `AOTCodeCache::initialize()`. Unfortunately Aarch64 does not use this flag, so I need to add check to each query functions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2822631594 From kvn at openjdk.org Tue Apr 22 23:05:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Apr 2025 23:05:13 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Fix message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/32b997d7..91c37dad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From vlivanov at openjdk.org Wed Apr 23 00:40:04 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 00:40:04 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v11] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: riscv fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/88eacc48..3d1adff2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=09-10 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Wed Apr 23 00:40:04 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 00:40:04 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: <4c9BdDJr1BWA92Uv7HHIa6_ons9yzW0U6q0uOzKLxPY=.102b7ec5-c4bb-4117-a092-0815d01aa74d@github.com> On Tue, 22 Apr 2025 16:46:44 GMT, Hamlin Li wrote: >> src/hotspot/share/runtime/abstract_vm_version.cpp line 349: >> >>> 347: assert(features_offset <= cpu_info_string_len, ""); >>> 348: if (features_offset < cpu_info_string_len) { >>> 349: assert(cpu_info_string[features_offset + 0] == ',', ""); >> >> This assert fails on riscv. > > A simple fix could be: > > diff --git a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp > index 484a2a645aa..a785dc65c9e 100644 > --- a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp > +++ b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp > @@ -196,25 +196,12 @@ void VM_Version::setup_cpu_available_features() { > > _cpu_info_string = os::strdup(buf); > > - _features_string = extract_features_string(_cpu_info_string, > - strnlen(_cpu_info_string, sizeof(buf)), > - features_offset); > + _features_string = _cpu_info_string; > } Alternatively, it's fine for now to completely drop `extract_features_string` call on linux-riscv (as on some other platforms) and fix it separately. Then `VectorSupport.getCPUFeatures()` returns empty string. `VectorMathLibrary` doesn't rely on `CPUFeatures` on RISC-V. Let me know how you prefer to handle it. >> src/hotspot/share/runtime/abstract_vm_version.hpp line 61: >> >>> 59: static const char* _features_string; >>> 60: >>> 61: static const char* _cpu_info_string; >> >> Not quite sure the reason to introduce `_cpu_info_string`. >> Seems to me you could just use _features_string, and remove _cpu_info_string and its related code, e.g. `extract_features_string`. Please check the code in `test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java` > > Mayber in `CPUFeatures`, could use the similar code as `CPUInfo` to split the cpu string into cpu features? The intention is to align `_features_string` with `_features` which enumerates well-known CPU capabilities JVM manages. As of now, `_features_string` contains more information, so I introduced `_cpu_info_string` to keep it. Speaking of `test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java`, the approach chosen there may be fine for a test library, but we need a more stable API between JVM and JDK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055073587 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055071408 From vlivanov at openjdk.org Wed Apr 23 00:40:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 00:40:07 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 13:45:26 GMT, Hamlin Li wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into vector.math.01.java >> - RVV and SVE adjustments >> - fix broken merge >> - Merge branch 'master' into vector.math.01.java >> - Fix debugName handling >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - ... and 14 more: https://git.openjdk.org/jdk/compare/41f2363b...88eacc48 > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 82: > >> 80: return features; >> 81: } >> 82: } > > Maybe an extra line needed at the end of this file? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055072426 From vlivanov at openjdk.org Wed Apr 23 00:40:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 00:40:07 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 14:46:21 GMT, Ludovic Henry wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into vector.math.01.java >> - RVV and SVE adjustments >> - fix broken merge >> - Merge branch 'master' into vector.math.01.java >> - Fix debugName handling >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - ... and 14 more: https://git.openjdk.org/jdk/compare/41f2363b...88eacc48 > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 75: > >> 73: return switch (StaticProperty.osArch()) { >> 74: case "amd64", "x86_64" -> SVML; >> 75: case "aarch64" -> SLEEF; > > We should be supporting SLEEF on `riscv64`. Was there a specific motivation not to include it here? Goot catch, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055072456 From vlivanov at openjdk.org Wed Apr 23 00:49:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 00:49:50 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: References: Message-ID: <5_VO9Man5zVU0MPXJK9lkxmh5w58zZ1k4DhLdZQEz9w=.16f2054a-ef90-4c8a-a3ef-c2ee758643a2@github.com> On Fri, 18 Apr 2025 16:16:28 GMT, Jorn Vernee wrote: >> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into vector.math.01.java >> - RVV and SVE adjustments >> - fix broken merge >> - Merge branch 'master' into vector.math.01.java >> - Fix debugName handling >> - Merge branch 'master' into vector.math.01.java >> - RVV and SVE adjustments >> - Merge branch 'master' into vector.math.01.java >> - Fix windows-aarch64 build failure >> - features_string -> cpu_info_string >> - ... and 14 more: https://git.openjdk.org/jdk/compare/685357bf...88eacc48 > > src/hotspot/share/prims/vectorSupport.cpp line 622: > >> 620: >> 621: ThreadToNativeFromVM ttn(thread); >> 622: return env->NewStringUTF(features_string); > > Isn't there a way to do this without the extra transition? > > How about: > > > oop result = java_lang_String::create_oop_from_str((char*) bytes, CHECK_NULL); > return (jstring) JNIHandles::make_local(THREAD, result); Fair enough. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 272: > >> 270: MemorySegment addr = LOOKUP.findOrThrow(symbol); >> 271: debug("%s %s => 0x%016x\n", op, symbol, addr.address()); >> 272: T impl = implSupplier.apply(opc); // TODO: should call the very same native implementation eventually (once FFM API supports vectors) > > FWIW, one current barrier I see to implementing the vector calling convention in the linker, is that the FFM linker (currently) transmits register values to the downcall stub use Java primitive types. So, in order to support vector calling conventions, we would need to add some kind of 'primitive' that can hold the entire vector value, and preferably gets passed in the right register. > > However, I think in the case of these math libraries in particular, speed of the fallback implementation is not that much of an issue, since there is also an intrinsic. So alternatively, we could split a vector value up into smaller integral types (`int`, `long`) -> pass them to the downcall stub in that form -> and then reconstruct the full vector value in its target register. (I used the same trick when I was experimenting with FP80 support, which also requires splitting up the 80 bit value up into 2 `long`s). IMO an in-memory representation for vectors is preferred when it comes to FFM linker calling conventions. 512-bit vector requires 8 longs, so some of them will end up passed on stack for any non-trivial case. And with in-memory representation, VM can elide vector store/load once FFM linker stub is intrinsified. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 305: > >> 303: @ForceInline >> 304: /*package-private*/ static >> 305: > > > Here you're using `Vector` instead of `VectorPayload` for the binary op, so there seems to be a discrepancy with `VectorSupport`. I don't have a strong preference, but I kept it aligned with `unaryOp`/`binaryOp` intrinsics. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055086616 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055086145 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055087281 From vlivanov at openjdk.org Wed Apr 23 01:02:19 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 01:02:19 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Avoid thread state transition in VectorSupport_GetCPUFeatures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/3d1adff2..42ed9baa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=10-11 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From jbhateja at openjdk.org Wed Apr 23 02:07:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 02:07:47 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 19:37:54 GMT, Dean Long wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > When I made my suggestions, I didn't realize it would also require changes on the Graal side. So I would suggest a separate PR only if the Graal team agrees. Hi @dean-long , I have created a follow up JBS (JDK-8355341) to capture your suggestion. Thanks for reviews @xmas92 and @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2822875551 From jbhateja at openjdk.org Wed Apr 23 02:07:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 02:07:48 GMT Subject: Integrated: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. This pull request has now been integrated. Changeset: 4c373703 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/4c373703d9ed63dfc85df7cdcc04ecad5b02ade0 Stats: 16 lines in 4 files changed: 5 ins; 5 del; 6 mod 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Reviewed-by: aboldtch, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24664 From fyang at openjdk.org Wed Apr 23 02:13:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 23 Apr 2025 02:13:48 GMT Subject: RFR: 8355239: RISC-V: Do not support subword scatter store In-Reply-To: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> References: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> Message-ID: On Tue, 22 Apr 2025 02:47:02 GMT, Fei Yang wrote: > Hi, please consider this small enhancement change. > > Currently, only word and double-word gather load and scatter store are supported on riscv. > Both subword gather load and scatter store are not supported due to constraint of riscv vector extension. > [JDK-8331150](https://bugs.openjdk.org/browse/JDK-8331150) makes this constraint explicit for subword gather load. > For parity and consistency, this also makes it explicit for subword scatter store as well. > > Testing: `jdk_vector` tested with QEMU vector extension. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24787#issuecomment-2822882676 From fyang at openjdk.org Wed Apr 23 02:13:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 23 Apr 2025 02:13:48 GMT Subject: Integrated: 8355239: RISC-V: Do not support subword scatter store In-Reply-To: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> References: <4k2oLOc-5JRW4__vf9tEfcIQEaTcs6XUY58lQbaxpLc=.535c54ef-2dd1-479c-a1ca-8757c35964bc@github.com> Message-ID: On Tue, 22 Apr 2025 02:47:02 GMT, Fei Yang wrote: > Hi, please consider this small enhancement change. > > Currently, only word and double-word gather load and scatter store are supported on riscv. > Both subword gather load and scatter store are not supported due to constraint of riscv vector extension. > [JDK-8331150](https://bugs.openjdk.org/browse/JDK-8331150) makes this constraint explicit for subword gather load. > For parity and consistency, this also makes it explicit for subword scatter store as well. > > Testing: `jdk_vector` tested with QEMU vector extension. This pull request has now been integrated. Changeset: a8c6ff16 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/a8c6ff161c2c4f1dcf0f8588c9d007994c84e703 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8355239: RISC-V: Do not support subword scatter store Reviewed-by: mli, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/24787 From jbhateja at openjdk.org Wed Apr 23 05:47:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 05:47:21 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v4] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Add dynamic sized feature vectors - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - dropping unneeded feature enabling/checks - 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/b95ac21c..5d09adb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=02-03 Stats: 254102 lines in 1547 files changed: 59450 ins; 188843 del; 5809 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From amitkumar at openjdk.org Wed Apr 23 06:09:25 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 23 Apr 2025 06:09:25 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: improved mvc implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24480/files - new: https://git.openjdk.org/jdk/pull/24480/files/cf709eec..f75209f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=02-03 Stats: 22 lines in 1 file changed: 3 ins; 6 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480 PR: https://git.openjdk.org/jdk/pull/24480 From chagedorn at openjdk.org Wed Apr 23 06:11:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Apr 2025 06:11:48 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v7] In-Reply-To: References: Message-ID: <1wo1Xw9Jp2p0bGDqToKKYZPKe9Qf-tZCZmv0ukvEDsU=.0818fa69-7895-42ff-b609-9c7bae8bd060@github.com> On Tue, 22 Apr 2025 13:28:58 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Test with different phase Thanks for the updates, that looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24479#pullrequestreview-2786040530 From epeter at openjdk.org Wed Apr 23 07:15:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 07:15:54 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor and improve TestNestedSynchronize.java Fought my way down to `src/hotspot/share/opto/regmask.hpp`. Need to hop off the train, so I'll push some comments already now. More later. src/hotspot/share/opto/matcher.cpp line 1340: > 1338: // Empty them all. > 1339: for (uint i = 0; i < cnt; i++) { > 1340: ::new (&(msfpt->_in_rms[i])) RegMask(C->comp_arena()); Here you do array indexing with `&`. Above you did `base + i`. Just an observation, I leave it to you if you want to do something about that. src/hotspot/share/opto/node.cpp line 558: > 556: MachProjNode* mach = n->as_MachProj(); > 557: MachProjNode* mthis = this->as_MachProj(); > 558: new (&mach->_rout) RegMask(mthis->_rout); Hmm. Do you have some defense against these kinds of shallow copies? Or are there cases where shallow copies of a RegMask are somehow valid? You could for example `nullptr` our the pointers in the copy constructor. Just an idea, not sure if that even works. Or you could do a proper copy in the copy constructor. Maybe you have already thought about both of those options, and neither are good... src/hotspot/share/opto/regmask.cpp line 395: > 393: > 394: #ifndef PRODUCT > 395: bool RegMask::_dump_end_run(outputStream* st, OptoReg::Name start, In my understanding we only use underscore for fields, and not methods? src/hotspot/share/opto/regmask.hpp line 143: > 141: // constructors. If we get copied/cloned, &_RM_UP_EXT will no longer equal > 142: // orig_ext_adr. > 143: uintptr_t** orig_ext_adr = &_RM_UP_EXT; There should also be an underscore for this field, for consistency, right? src/hotspot/share/opto/regmask.hpp line 182: > 180: // r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 ... > 181: // [0 0 0 0 |0 1 1 0 |0 0 1 0 ] [1 1 0 1 |0 0 0 0] as as as > 182: // [0] [1] [2] [0] [1] I kinda missed these two lines at first... it is not directly clear what they are for. The first one is the mask, aaah and it is extended with the `as` values. No idea yet what the second line is about... Maybe some annotation would help? For me it would ok if that means the lines become a little longer. src/hotspot/share/opto/regmask.hpp line 193: > 191: // In this example, registers {r5, r6} and stack locations {s0, s2, s3, s5} > 192: // are included in the register mask. Depending on the value of _all_stack, > 193: // (s10, s11, ...) are all included (as = 1) or excluded (as = 0). Note that Maybe I missed it: what is `as`? src/hotspot/share/opto/regmask.hpp line 199: > 197: // > 198: // The only operation that may update the _offset attribute is > 199: // RegMask::rollover(). This operation requires the register mask to be clean What does `clean` mean? All zero? Or all ones? src/hotspot/share/opto/regmask.hpp line 219: > 217: > 218: // Access word i in the register mask. > 219: const uintptr_t& _rm_up(unsigned int i) const { You have another set of underscore methods here... did you mean to make them private? I quickly checked, and did not find any underscore methods in the file before your change. This looks a little like python style, where you cannot make a method private, and so the convention is to make them underscore :) ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2786035319 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055312723 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055320209 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055330115 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055354470 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055368753 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055360352 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055369508 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055378718 From epeter at openjdk.org Wed Apr 23 07:15:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 07:15:55 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> On Wed, 23 Apr 2025 06:14:03 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > src/hotspot/share/opto/node.cpp line 558: > >> 556: MachProjNode* mach = n->as_MachProj(); >> 557: MachProjNode* mthis = this->as_MachProj(); >> 558: new (&mach->_rout) RegMask(mthis->_rout); > > Hmm. Do you have some defense against these kinds of shallow copies? Or are there cases where shallow copies of a RegMask are somehow valid? You could for example `nullptr` our the pointers in the copy constructor. Just an idea, not sure if that even works. Or you could do a proper copy in the copy constructor. Maybe you have already thought about both of those options, and neither are good... Ah, I see you have a `uintptr_t** orig_ext_adr = &_RM_UP_EXT;` trick. So shallow copies should not ever get used, right? > src/hotspot/share/opto/regmask.cpp line 395: > >> 393: >> 394: #ifndef PRODUCT >> 395: bool RegMask::_dump_end_run(outputStream* st, OptoReg::Name start, > > In my understanding we only use underscore for fields, and not methods? I think just making it private is sufficient, no? > src/hotspot/share/opto/regmask.hpp line 193: > >> 191: // In this example, registers {r5, r6} and stack locations {s0, s2, s3, s5} >> 192: // are included in the register mask. Depending on the value of _all_stack, >> 193: // (s10, s11, ...) are all included (as = 1) or excluded (as = 0). Note that > > Maybe I missed it: what is `as`? Is it an abbreviation for something? > src/hotspot/share/opto/regmask.hpp line 199: > >> 197: // >> 198: // The only operation that may update the _offset attribute is >> 199: // RegMask::rollover(). This operation requires the register mask to be clean > > What does `clean` mean? All zero? Or all ones? I think all ones. You could write `... mask to be clean (all ones) and ...` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055352711 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055332735 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055366456 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055371270 From epeter at openjdk.org Wed Apr 23 07:15:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 07:15:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: References: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> Message-ID: <4UprmugxFZfWfKirSA4aJNwf2FenfjfergB-FNExod8=.89841ab7-2e07-4a6b-9f4e-3b069560be93@github.com> On Mon, 7 Apr 2025 15:15:18 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/regmask.hpp line 40: >> >>> 38: // stack slots used by BoxLockNodes. We reach this limit by, e.g., deeply >>> 39: // nesting synchronized statements in Java. >>> 40: const int BoxLockNode_slot_limit = 200; >> >> Where does this number come from? I've added arbitrary constants like this, and sometimes it is hard to give a good justification. But at least writing down what was your thinking might help someone else if they come across it later. Do you have a sense how large it should be at least or at most? > > It is indeed arbitrary (and should be very generous for all practical cases). We need a limit so that we can compute an upper bound for register mask sizes. I've updated the comment now, does it make more sense? Nice! Optional: make it upper case to emphasize that it is a constant at the use site. Suggestion: const int BoxLockNode_SLOT_LIMIT = 200; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055342082 From epeter at openjdk.org Wed Apr 23 07:15:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 07:15:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> References: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> Message-ID: On Wed, 23 Apr 2025 06:47:14 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/regmask.hpp line 193: >> >>> 191: // In this example, registers {r5, r6} and stack locations {s0, s2, s3, s5} >>> 192: // are included in the register mask. Depending on the value of _all_stack, >>> 193: // (s10, s11, ...) are all included (as = 1) or excluded (as = 0). Note that >> >> Maybe I missed it: what is `as`? > > Is it an abbreviation for something? Ooooh, it stands for `_all_stack == as`. Would be nice if that was stated explicitly! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055368191 From thartmann at openjdk.org Wed Apr 23 08:28:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Apr 2025 08:28:50 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Backed out again with https://github.com/openjdk/jdk/pull/24815 due to failures in our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2823473931 From epeter at openjdk.org Wed Apr 23 08:30:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 08:30:00 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor and improve TestNestedSynchronize.java Now I only have the tests remaining. Here some more VM code comments. src/hotspot/share/opto/regmask.hpp line 429: > 427: } > 428: > 429: RegMask(const RegMask& rm) : RegMask(rm, nullptr) {} This is the copy constructor, right? Can you add a comment what kind of implementation you chose here, and why? src/hotspot/share/opto/regmask.hpp line 448: > 446: } > 447: > 448: // Empty mask check. Ignores registers included through the all-stack flag. Suggestion: // Empty mask check. Ignores registers included through the all_stack flag. For "greppability" src/hotspot/share/opto/regmask.hpp line 452: > 450: assert(valid_watermarks(), "sanity"); > 451: for (unsigned i = _lwm; i <= _hwm; i++) { > 452: if (_rm_up(i)) { Is this an implicit `nullptr` check? If so, you need to make it explicit, the style guide forbids implicit null checks. src/hotspot/share/opto/regmask.hpp line 464: > 462: for (unsigned i = _lwm; i <= _hwm; i++) { > 463: uintptr_t bits = _rm_up(i); > 464: if (bits) { This also looks like an implicit null check. src/hotspot/share/opto/regmask.hpp line 473: > 471: > 472: // Get highest-numbered register from mask, or BAD if mask is empty. Ignores > 473: // registers included through the all-stack flag. Suggestion: // registers included through the all_stack flag. Greppability src/hotspot/share/opto/regmask.hpp line 480: > 478: while (i > _lwm) { > 479: uintptr_t bits = _rm_up(--i); > 480: if (bits) { Implicit null check? src/hotspot/share/opto/regmask.hpp line 512: > 510: tmp |= _rm_up(i); > 511: } > 512: return !tmp && is_AllStack(); Implicit null checks? Could you not check `is_AllStack` early and return already? src/hotspot/share/opto/regmask.hpp line 551: > 549: static int num_registers(uint ireg, LRG &lrg); > 550: > 551: // Overlap test. Non-zero if any registers in common, including all-stack. Suggestion: // Overlap test. Non-zero if any registers in common, including all_stack. greppability src/hotspot/share/opto/regmask.hpp line 561: > 559: unsigned lwm = MAX2(_lwm, rm._lwm); > 560: for (unsigned i = lwm; i <= hwm; i++) { > 561: if (_rm_up(i) & rm._rm_up(i)) { Implicit null check? src/hotspot/share/opto/regmask.hpp line 589: > 587: } > 588: } > 589: } There is a bit of code duplication here. A helper method could help, no pun intended. src/hotspot/share/opto/regmask.hpp line 619: > 617: _hwm = _rm_max(); > 618: _set_range(0, 0xFF, _rm_size); > 619: set_AllStack(true); Suggestion: set_AllStack(); You already have a default `true` value, right? Check for other occurances. src/hotspot/share/opto/regmask.hpp line 711: > 709: _hwm = rm._rm_max(); > 710: } > 711: // Narrow the watermarks if &rm spans a narrower range. Update after to Suggestion: // Narrow the watermarks if rm spans a narrower range. Update after to src/hotspot/share/opto/regmask.hpp line 747: > 745: > 746: // Subtract 'rm' from 'this', but ignore everything in 'rm' that does not > 747: // overlap with us and to not modify our all-stack flag. Supports masks of A little confused about the grammar in `and to not modify our all-stack flag`... src/hotspot/share/opto/regmask.hpp line 799: > 797: > 798: public: > 799: unsigned int static basic_rm_size() { return _RM_SIZE; } What makes it `basic`? Elsewhere, you call `RM_SIZE` the "base size". I don't understand this part, so I'm just asking if you think it is consistent? ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2786315683 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055470536 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055474336 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055479300 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055480721 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055481600 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055482640 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055485184 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055485667 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055486853 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055493829 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055500757 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055504351 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055507927 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055517750 From epeter at openjdk.org Wed Apr 23 08:30:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 08:30:01 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 07:57:15 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > src/hotspot/share/opto/regmask.hpp line 448: > >> 446: } >> 447: >> 448: // Empty mask check. Ignores registers included through the all-stack flag. > > Suggestion: > > // Empty mask check. Ignores registers included through the all_stack flag. > > For "greppability" Why is that ok to ignore the `all_stack` registers? Would that be expected/clear at the use-site? > src/hotspot/share/opto/regmask.hpp line 551: > >> 549: static int num_registers(uint ireg, LRG &lrg); >> 550: >> 551: // Overlap test. Non-zero if any registers in common, including all-stack. > > Suggestion: > > // Overlap test. Non-zero if any registers in common, including all_stack. > > greppability There are more, I won't tag them all now ;) > src/hotspot/share/opto/regmask.hpp line 561: > >> 559: unsigned lwm = MAX2(_lwm, rm._lwm); >> 560: for (unsigned i = lwm; i <= hwm; i++) { >> 561: if (_rm_up(i) & rm._rm_up(i)) { > > Implicit null check? More cases below, won't tag them all now. > src/hotspot/share/opto/regmask.hpp line 589: > >> 587: } >> 588: } >> 589: } > > There is a bit of code duplication here. A helper method could help, no pun intended. Boah, but it is also not horrible to leave as is. Maybe a helper method does not make more readable. Not sure. > src/hotspot/share/opto/regmask.hpp line 799: > >> 797: >> 798: public: >> 799: unsigned int static basic_rm_size() { return _RM_SIZE; } > > What makes it `basic`? Elsewhere, you call `RM_SIZE` the "base size". I don't understand this part, so I'm just asking if you think it is consistent? The name divergence between `basic_rm_size` and `_RM_SIZE` generally makes me a little suspicious if we chose the names right? Why do we even need the `RM` / `rm` prefix everyhwere? Is that not already clear from the context, we are after all in a register mask ? Not sure if it is worth changing everything now, or ever. But we should at least look for consistency ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055476393 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055489553 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055495759 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055494825 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055522590 From chagedorn at openjdk.org Wed Apr 23 08:36:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Apr 2025 08:36:46 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24549#pullrequestreview-2786425093 From jbhateja at openjdk.org Wed Apr 23 08:36:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 08:36:54 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v4] In-Reply-To: References: <1kRZNcIzkhr_xU_x5mamZnGJ_6nFkH-5ylFEyT2H_AQ=.a0f578c2-a58b-4708-9f58-c897a62e2526@github.com> Message-ID: On Mon, 21 Apr 2025 10:56:28 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8350896 >> - Review comments resolutions >> - Review comments resolutions >> - 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value > > src/hotspot/share/opto/intrinsicnode.cpp line 299: > >> 297: // in case input equals above estimated lower bound. >> 298: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> 299: hi = max_mask_bit_width < mask_bit_width ? (1L << max_mask_bit_width) - 1 : hi; > > Somehow the conversation disappeared with changes, so I'll screenshot it it here for context: > > ![image](https://github.com/user-attachments/assets/e4e5d05b-3fdd-4e67-ab0a-8164783648bc) > >> 'hi' is initialized to 'max_int' and nothing before line 297 is modifying 'hi'. > > That is not super apparent to me, at least not at first. I'd have to know the code really well. A quick scan gives me lines like `hi = (1UL << bitcount) - 1;` above. Sure, that is on an unrelated path, but I'd have to check like 20+ lines above. I don't have that attention span ;) > > So maybe some better naming here could help. > > Can you convert your explanations into comments in the code? Or even better asserts, so we are even more sure that it is correct now, and also once others make changes here that might invalidate your assumptions around `hi` and `lo`? Added the assertions for hi and lo bounds to remove ambiguities. FYI, Known zero bits can be used to estimate the the upper bound by assuming all unknown bits as 1 while known one bits can be used to estimate the lower bound by assuming all unknown bits as zeros, computing compression results value range based on known zero and one bits is tricky, Thus, we are being pessimistic here and only considering few scenarios around known and unknown masks. With known masks not equal to -1, the popcount of mask bits can be used to estimate the upper bound of the result, while the lower bound is set to 0 since the popcount will always be less than the maximum bit width of the integral type thus even if all the corresponding source bits are 1 result of compression will never be a -ve value. For the unknown mask, we try estimating the result bit width based on the mask bounds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2055534735 From jbhateja at openjdk.org Wed Apr 23 08:36:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 08:36:50 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resoultions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/a4ae0803..18b5c239 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=03-04 Stats: 13 lines in 2 files changed: 3 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From epeter at openjdk.org Wed Apr 23 08:49:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 08:49:13 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor and improve TestNestedSynchronize.java And a few comments for the tests :) test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 34: > 32: * -XX:+UnlockDiagnosticVMOptions > 33: * -XX:+AbortVMOnCompilationFailure > 34: * compiler.arguments.TestMaxMethodArguments Could there be a benefit to having a run without some of the flags here, and therefore also without the requires? test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 43: > 41: try { > 42: test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217 , 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255); > 43: } catch (Exception e) { That makes me a little nervous. You catch all exceptions. What if we throw some unexpected null pointer exception? Is that fine too? Suggestion: define your own exception, and only catch that one. test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 37: > 35: * -XX:+UnlockDiagnosticVMOptions > 36: * -XX:+AbortVMOnCompilationFailure > 37: * compiler.locks.TestNestedSynchronize How about a run with fewer flags? test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 90: > 88: // The above is a massive program. Therefore, we do not directly inline the > 89: // program in TestNestedSynchronize and instead compile and run it via the > 90: // CompileFramework. Nice, I like it. Of course I have no bias here ;) test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 107: > 105: acc.addFirst(String.format("public class %s {", test_class_name)); > 106: acc.addLast("}"); > 107: return String.join("\n", acc); Nit: What prevented you from generating it with a `ArrayList`, and just only append at the end? In fact, that would allow you to use a StringBuilder directly. And you could format the `synchronized` string only once. That might be even more efficient. You can also leave it, this is really a nit :) test/jdk/java/lang/invoke/BigArityTest.java line 38: > 36: * -XX:CompileCommand=memlimit,*.*,0 > 37: * -esa -DBigArityTest.ITERATION_COUNT=1 > 38: * test.java.lang.invoke.BigArityTest Would it make sense to also have a run with fewer flags? test/jdk/java/lang/invoke/TestCatchExceptionWithVarargs.java line 32: > 30: * timeouts due to compilation of a large number of methods with a > 31: * large number of parameters. > 32: * @run main/othervm -XX:MaxNodeLimit=15000 TestCatchExceptionWithVarargs Why not have two runs here. One that requires that there is no `Xcomp`, where we have the normal node limit. And one where you lower the node limit, so that `Xcomp` is ok? test/jdk/java/lang/invoke/VarargsArrayTest.java line 46: > 44: * -DVarargsArrayTest.MAX_ARITY=255 > 45: * -DVarargsArrayTest.START_ARITY=250 > 46: * VarargsArrayTest Same here. But I mean are those timeouts ok with Xcomp? Are we sure that these timeouts are only a test issue and not a product issue? ------------- PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2786415489 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055532228 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055535065 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055537644 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055540066 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055548814 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055550590 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055554337 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055557452 From epeter at openjdk.org Wed Apr 23 08:49:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 08:49:14 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 11:32:16 GMT, Daniel Lund?n wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update comments >> - Revise TestNestedSynchronize to make use of CompileFramework > > This is a rather intrusive changeset that may also affect ports not supported by Oracle. @offamitkumar @TheRealMDoerr @snazarkin @bulasevich @RealFYang: Can you please test these changes on your respective ports? In particular, please make sure to run the tests `compiler/arguments/TestMaxMethodArguments.java` and `compiler/locks/TestNestedSynchronize.java`. @dlunde Again: thanks for working on this! It looks like a lot of work, and the existing code was not exactly in the best stlye already ? So don't get discouraged by my many comments, a lot of them are small things anyway, and many are just nits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2823527474 From epeter at openjdk.org Wed Apr 23 08:49:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 08:49:15 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:40:30 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > test/jdk/java/lang/invoke/BigArityTest.java line 38: > >> 36: * -XX:CompileCommand=memlimit,*.*,0 >> 37: * -esa -DBigArityTest.ITERATION_COUNT=1 >> 38: * test.java.lang.invoke.BigArityTest > > Would it make sense to also have a run with fewer flags? Or why do we need to set all these flags? If we need some of them, then a comment could be helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055552355 From mli at openjdk.org Wed Apr 23 08:50:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 23 Apr 2025 08:50:53 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 01:02:19 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Avoid thread state transition in VectorSupport_GetCPUFeatures Found another possible issue on riscv. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 288: > 286: IntFunction> implSupplier, > 287: V v) { > 288: var entry = lookup(op, opc, vspecies, implSupplier); Seems there is another issue for riscv here. If the rvv extension is not supported on the running machine, it will still generate the code using rvv, this should lead to a crash at runtime? ------------- PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2786445269 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055551021 From mli at openjdk.org Wed Apr 23 08:50:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 23 Apr 2025 08:50:54 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:40:44 GMT, Hamlin Li wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid thread state transition in VectorSupport_GetCPUFeatures > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 288: > >> 286: IntFunction> implSupplier, >> 287: V v) { >> 288: var entry = lookup(op, opc, vspecies, implSupplier); > > Seems there is another issue for riscv here. > If the rvv extension is not supported on the running machine, it will still generate the code using rvv, this should lead to a crash at runtime? In previous code, we use `UseRVV` to detect if rvv extension is supported. On the other hand, user can choose to disable UseRVV if they want even if rvv extension is supported on the running machine. In this sense, there could be similar issue on other platforms? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055556734 From epeter at openjdk.org Wed Apr 23 08:51:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 08:51:55 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 16:18:20 GMT, Kangcheng Xu wrote: >> This looks really interesting! >> >> I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think? > > @eme64 Could you please take a look at this if you have some time? Thanks! @tabjy Do you want us to review again? We generally wait for a ping, otherwise we assume you are still working on it ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2823535462 From mli at openjdk.org Wed Apr 23 08:56:44 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 23 Apr 2025 08:56:44 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: <4c9BdDJr1BWA92Uv7HHIa6_ons9yzW0U6q0uOzKLxPY=.102b7ec5-c4bb-4117-a092-0815d01aa74d@github.com> References: <4c9BdDJr1BWA92Uv7HHIa6_ons9yzW0U6q0uOzKLxPY=.102b7ec5-c4bb-4117-a092-0815d01aa74d@github.com> Message-ID: <4LfzOzPmZZ42Aoub8uPvjvAvUjD29tr54PyNQMGSEEM=.fd73d966-b7cc-4695-a79f-fd3c62467ed0@github.com> On Wed, 23 Apr 2025 00:23:46 GMT, Vladimir Ivanov wrote: >> A simple fix could be: >> >> diff --git a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp >> index 484a2a645aa..a785dc65c9e 100644 >> --- a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp >> +++ b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp >> @@ -196,25 +196,12 @@ void VM_Version::setup_cpu_available_features() { >> >> _cpu_info_string = os::strdup(buf); >> >> - _features_string = extract_features_string(_cpu_info_string, >> - strnlen(_cpu_info_string, sizeof(buf)), >> - features_offset); >> + _features_string = _cpu_info_string; >> } > > Alternatively, it's fine for now to completely drop `extract_features_string` call on linux-riscv (as on some other platforms) and fix it separately. Then `VectorSupport.getCPUFeatures()` returns empty string. `VectorMathLibrary` doesn't rely on `CPUFeatures` on RISC-V. > > Let me know how you prefer to handle it. I think we still need this or similar thing on riscv. Please check my new comments below about rvv extension on riscv. On the other hand, it's also good to have it on riscv for consistency, and there is a log output of "cpu features" in VectorMathLibrary.java >> Mayber in `CPUFeatures`, could use the similar code as `CPUInfo` to split the cpu string into cpu features? > > The intention is to align `_features_string` with `_features` which enumerates well-known CPU capabilities JVM manages. As of now, `_features_string` contains more information, so I introduced `_cpu_info_string` to keep it. > > Speaking of `test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java`, the approach chosen there may be fine for a test library, but we need a more stable API between JVM and JDK. I'm fine with this. But it might be better to change the spliting regex of `_features_string` in CPUFeatures.java to support riscv cpu features format. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055572192 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2055576765 From dlunden at openjdk.org Wed Apr 23 08:59:52 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 23 Apr 2025 08:59:52 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:45:56 GMT, Emanuel Peter wrote: >> This is a rather intrusive changeset that may also affect ports not supported by Oracle. @offamitkumar @TheRealMDoerr @snazarkin @bulasevich @RealFYang: Can you please test these changes on your respective ports? In particular, please make sure to run the tests `compiler/arguments/TestMaxMethodArguments.java` and `compiler/locks/TestNestedSynchronize.java`. > > @dlunde Again: thanks for working on this! It looks like a lot of work, and the existing code was not exactly in the best stlye already ? So don't get discouraged by my many comments, a lot of them are small things anyway, and many are just nits. @eme64 Thanks for the comments, I'll start addressing them soon! I'm certainly not discouraged (rather the opposite), keep the comments coming :slightly_smiling_face: ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2823558623 From epeter at openjdk.org Wed Apr 23 09:03:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 09:03:54 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v9] In-Reply-To: References: Message-ID: <_HyJNwyMWfrbBZ6tCavVUPFjEGCoVqIMCwyrN6ZwGSY=.5eb105d0-f6fd-475b-948c-3b97d175104b@github.com> On Mon, 31 Mar 2025 08:46:29 GMT, Roland Westrelin wrote: >> @rwestrel Is this ready for review? > > @eme64 yes, it's ready for review. @rwestrel Sorry it took so long for me to look at this. Did you do some benchmarking to prove that `ShortLoopIter = 1000` is reasonable? The Benchmark you published earlier does not seem to do that, right? https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221 I would also like to see that benchmark integrated. If you are using a benchmark to demonstrate the performance, it should be integrate so others can easily verify on their platform :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2823574678 From jbhateja at openjdk.org Wed Apr 23 09:11:49 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 09:11:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v6] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 20:31:24 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new input values are provided for the existing micro-benchmark and a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over the baseline. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios. >> >> | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | >> | :-------------------: | :----------------: | :----------------: | :------------------------: | >> | [-1, 1] | 22671 | 22190 | -2.12 | >> | [-2, 2] | 22680 | 22191 | -2.16 | >> | [-10, 10] | 22683 | 22149 | -2.35 | >> | [-20, 20] | 22694 | 22183 | -2.25 | >> | [-100, 100] | 29806 | 33675 | +12.98 | >> | [-1000, 1000] | 46747 | 49179 | +5.20 | >> | [-10000, 10000] ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Restructure tanh micro-benchmarks test/micro/org/openjdk/bench/java/lang/MathBench.java line 68: > 66: > 67: @Param({"-2.0", "-1.0", "-0.5", "-0.1", "0.0", "0.1", "0.5", "1.0", "2.0"}) > 68: public double tanhConstInput; The field is not a static final. So inputs to tanh will not be constant values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2055604603 From dfenacci at openjdk.org Wed Apr 23 09:14:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 23 Apr 2025 09:14:53 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 16:27:25 GMT, Vladimir Kozlov wrote: >> After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). >> Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). >> >> https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 >> >> To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. >> >> ### Testing >> >> Tier 1-3. >> No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). > > I finally found why `Assembler()` did not throw error when code blob is not allocated and `_blob` is `NULL`: [assembler.cpp#L47](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/assembler.cpp#L47) > > In **debug** VM `CodeBuffer::set_blob()` replaces `NULLs` with `basAddress` value: [codeBuffer.cpp#L184](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.cpp#L184). It is done to "poison" pointers when `free_blob()` is called. > > So we could fix the issue by adding check for `badAddress` in `Assembler()`. But it will cause VM exit instead of disabling only C2 implemented by #23630. On other hand it will match behavior of **product** VM. > > I think I prefer current suggested fix to disable only C2. Thanks @vnkozlov and @chhagedorn for your reviews! As I touched a few platform specific files I wouldn't mind having it tested on each one (even just low tiers, just to make sure I didn't involuntarily screw up something): @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32) would you mind? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24549#issuecomment-2823610502 From epeter at openjdk.org Wed Apr 23 09:26:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 09:26:58 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> References: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> Message-ID: On Fri, 28 Mar 2025 16:38:00 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: > > - merge fix > - Merge branch 'master' into JDK-8342692 > - merge fix > - Merge branch 'master' into JDK-8342692 > - merge > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - whitespace > - Merge branch 'master' into JDK-8342692 > - TestMemorySegment test fix > - ... and 31 more: https://git.openjdk.org/jdk/compare/dc5c4148...065abb29 A first few comments from a quick code scan :) src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 720: > 718: declare_constant(Deoptimization::Reason_div0_check) \ > 719: declare_constant(Deoptimization::Reason_loop_limit_check) \ > 720: declare_constant(Deoptimization::Reason_short_running_loop) \ Suggestion: declare_constant(Deoptimization::Reason_short_running_long_loop) \ Just for precision? src/hotspot/share/opto/c2_globals.hpp line 824: > 822: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ > 823: "long counted loop/long range checks: don't create loop nest if" \ > 824: "loop runs for small enough number of iterations") \ Could it make sense to have `ShortLoopIter` be a flag as well? That would allow you to write a nice JMH benchmark, where we can modify the threshold :) Wait... you mention `ShortLoopIter` in the PR description, but it only occurs once in a comment... what happened here? src/hotspot/share/opto/ifnode.cpp line 2232: > 2230: break; > 2231: case Deoptimization::DeoptReason::Reason_short_running_loop: > 2232: st->print("Short_Running_Loop "); Suggestion: case Deoptimization::DeoptReason::Reason_short_running_long_loop: st->print("Short_Running_Long_Loop "); Might be helpful that it is specifically about a long loop when debugging. test/hotspot/jtreg/compiler/longcountedloops/TestShortLoopLostLimit.java line 29: > 27: * @summary C2: long counted loop/long range checks: don't create loop-nest for short running loops > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation TestShortLoopLostLimit What about a run that does not require C2, and has no flags set? test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java line 30: > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:LoopUnrollLimit=100 > 30: * TestShortRunningIntLoopWithLongChecksPredicates What about a run that does not require C2, and has no flags set? test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoop.java line 45: > 43: > 44: public static void main(String[] args) { > 45: TestFramework.runWithFlags("-XX:LoopMaxUnroll=0", "-XX:LoopStripMiningIter=1000", "-XX:+UseCountedLoopSafepoints", "-XX:-UseProfiledLoopPredicate"); What about a run without all these flags? Hmm maybe that would make your IR rules more complicated? It would at least be good if there was a comment why you need all the flags here. test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopPredicatesClone.java line 29: > 27: * @summary C2: long counted loop/long range checks: don't create loop-nest for short running loops > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:LoopMaxUnroll=0 Run without flags, and runnable without C2? test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopScaleOverflow.java line 30: > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:LoopMaxUnroll=0 > 30: * -XX:-UseLoopPredicate -XX:-RangeCheckElimination TestShortRunningLongCountedLoopScaleOverflow Run without flags, and runnable without C2? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 799: > 797: IRNode.ADD_VI, "> 0", > 798: IRNode.STORE_VECTOR, "> 0"}, > 799: applyIfAnd = { "ShortRunningLongLoop", "true", "AlignVector", "false" }, Can you just copy the IR rule, please, so that we still have a failing rule without `ShortRunningLongLoop`? The reason I have it here is so that I will catch these cases that are currently not properly vectorized... and it would be a shame if we lost these tests. Also: can we whitelist `ShortRunningLongLoop` for the IR framework? I think we should make sure that we run all these MemorySegment tests with `ShortRunningLongLoop` enabled and disabled, just to make sure everything is ok with and without. What do you think? FYI: I'm making changes to this test again in https://github.com/openjdk/jdk/pull/24278. But I don't want to hold you back here with that. Still: maybe you can take my approach with `NoSpeculativeAliasingCheck`, and add a run with `ShortRunningLongLoop` enabled or disabled. Just to make sure we have at least something running with both enabled and also with disabled. test/hotspot/jtreg/compiler/rangechecks/TestLongRangeCheck.java line 37: > 35: * @run main/othervm -ea -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:-BackgroundCompilation -XX:-UseOnStackReplacement > 36: * -XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=5000 -XX:PerMethodTrapLimit=100 -XX:+IgnoreUnrecognizedVMOptions > 37: * -XX:-StressShortRunningLongLoop Why was this necessary? A comment could be nice. ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2786515069 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055628155 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055593625 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055630382 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055597688 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055598338 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055604496 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055606011 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055606502 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055621495 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2055623410 From epeter at openjdk.org Wed Apr 23 09:31:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 09:31:44 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v9] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:46:29 GMT, Roland Westrelin wrote: >> @rwestrel Is this ready for review? > > @eme64 yes, it's ready for review. @rwestrel I'll have a look at the predicate code later, I'm a little scared of the complexity there. But maybe we cannot do better to avoid circularity...? @chhagedorn You definitely need to eventually look at the predicate code, you're the expert here :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2823665430 From duke at openjdk.org Wed Apr 23 09:42:01 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 23 Apr 2025 09:42:01 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v3] In-Reply-To: References: Message-ID: > Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. > > Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24427/files - new: https://git.openjdk.org/jdk/pull/24427/files/b9dfee3b..f0ef7ec8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24427&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24427&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24427.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24427/head:pull/24427 PR: https://git.openjdk.org/jdk/pull/24427 From mchevalier at openjdk.org Wed Apr 23 09:53:14 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 09:53:14 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v2] In-Reply-To: References: Message-ID: > The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: > > - Test, TestSimple: > Disappeared with: > [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 > which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. > Reverting this fix makes the issue reappear. > - Reduced2: I fix here > - Test3, Reduced3: > Disappeared with: > [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 > which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) > > The issue comes from the fact that `And[IL]Node::Value` has a special handling when > an operand is a left-shift: in the expression > > lhs & (X << s) > > if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling > also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, > updating the Shift node during IGVN won't enqueue directly the And node, but only the > Conv node. If this conv node cannot be improved, the And node is not enqueued, and its > type is not as good as it could be. > > Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a > dead branch, so the node is about to be eleminated. On the second figure, we can see > `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens > during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. > > > > > > The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN > to make sure it has a chance to be refined. > > The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes > also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and > the `LShiftINode`. The fix has to take that into account. > > > > > Overall, the situation can be of the form: > > LShift -> Cast+ -> ConvI2L -> Cast+ -> And > > This second case was shadowed by [JDK-8347459](https://bugs... Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - Add reproducer for other case - Add other run in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24792/files - new: https://git.openjdk.org/jdk/pull/24792/files/d16518dc..fdfb4fc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24792&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24792&range=00-01 Stats: 223 lines in 4 files changed: 151 ins; 69 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24792/head:pull/24792 PR: https://git.openjdk.org/jdk/pull/24792 From mchevalier at openjdk.org Wed Apr 23 09:53:14 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 09:53:14 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 08:06:48 GMT, Marc Chevalier wrote: > The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: > > - Test, TestSimple: > Disappeared with: > [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 > which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. > Reverting this fix makes the issue reappear. > - Reduced2: I fix here > - Test3, Reduced3: > Disappeared with: > [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 > which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) > > The issue comes from the fact that `And[IL]Node::Value` has a special handling when > an operand is a left-shift: in the expression > > lhs & (X << s) > > if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling > also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, > updating the Shift node during IGVN won't enqueue directly the And node, but only the > Conv node. If this conv node cannot be improved, the And node is not enqueued, and its > type is not as good as it could be. > > Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a > dead branch, so the node is about to be eleminated. On the second figure, we can see > `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens > during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. > > > > > > The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN > to make sure it has a chance to be refined. > > The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes > also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and > the `LShiftINode`. The fix has to take that into account. > > > > > Overall, the situation can be of the form: > > LShift -> Cast+ -> ConvI2L -> Cast+ -> And > > This second case was shadowed by [JDK-8347459](https://bugs... I've added the other (not-anymore-)reproducer, tried to make clear where it comes from but where it was fixed. I've renamed the issue, added flag-free runs (because actually no flags are really needed for these). I think the PR is ready for more review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2823732940 From mchevalier at openjdk.org Wed Apr 23 09:58:25 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 09:58:25 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs Message-ID: Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. The unmentioned ones: - `ccp` - `ciReplay` - `ciTypeFlow` - `compilercontrol` - `debug` - `oracle` - `predicates` - `print` - `relocations` - `sharedstubs` - `splitif` - `tiered` - `whitebox` And those, that are not test folders: - `lib` - `patches` - `testlibraries` I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. Feel free to tell if other folders should be included (and in which tier). Thanks, Marc ------------- Commit messages: - Adding some folders Changes: https://git.openjdk.org/jdk/pull/24817/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24817&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354284 Stats: 7 lines in 1 file changed: 6 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24817.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24817/head:pull/24817 PR: https://git.openjdk.org/jdk/pull/24817 From mdoerr at openjdk.org Wed Apr 23 10:23:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 23 Apr 2025 10:23:56 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). PPC64 parts look good and a few quick tests have passed. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24549#pullrequestreview-2786788500 From epeter at openjdk.org Wed Apr 23 10:39:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 10:39:43 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:54:09 GMT, Volodymyr Paprotski wrote: > It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) > > Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP >>> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << > ============================== > TEST FAILURE > > After the fix: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS > > And on an AVX512 machine: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? @jatin-bhateja You should probably review this code, since you wrote the code originally :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2823854369 From epeter at openjdk.org Wed Apr 23 10:59:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 10:59:47 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: <7Y1VflHDgnEChBAv9bwWH5ayU-K9ngRa3BfPjgzzHP0=.61111d18-a22e-4cb5-9492-e50f5524ac08@github.com> On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. @rwestrel thanks for looking into this one! I have not yet deeply studied the PR, but am feeling some confusion about the naming. I think the `DependencyType` is really a good step into the right direction, it helps clean things up. I'm wondering if we should pick either `depends_only_on_test` or `pinned`, and use it everywhere consistently. Having both around as near synonymes (antonymes?) is a bit confusing for me. I'll look into the code more later. src/hotspot/share/opto/castnode.cpp line 39: > 37: const ConstraintCastNode::DependencyType ConstraintCastNode::WidenTypeDependency(true, false, "widen type dependency"); // not pinned, doesn't narrow type > 38: const ConstraintCastNode::DependencyType ConstraintCastNode::StrongDependency(false, true, "strong dependency"); // pinned, narrows type > 39: const ConstraintCastNode::DependencyType ConstraintCastNode::UnconditionalDependency(false, false, "unconditional dependency"); // pinned, doesn't narrow type Is there really a good reason to have the names `Regular`, `WidenType`, `Strong` and `Unconditional`? Did we just get used to these names over time, or do they really have a good reason for existance. They just don't really mean that much to me. Calling them (non)pinned and (non)narrowing would make more sense to me. src/hotspot/share/opto/castnode.hpp line 58: > 56: bool depends_only_on_test() const { > 57: return _depends_only_on_test; > 58: } Is this synonimous to `non_pinning`? Might that be more descriptive? src/hotspot/share/opto/castnode.hpp line 91: > 89: private: > 90: const bool _depends_only_on_test; // Does this Cast depends on its control input or is it pinned? > 91: const bool _narrows_type; // Does this Cast narrows the type i.e. if input type is narrower can it be removed? I think it would be good to have a really strong definition of these two, because everything else depends on it. I would recommend to either use `depends_only_on_test` as the "primary" word here, or else `pinned`. But then try to consistently use the chosen one everywhere. Just to avoid confusion with these near synonymes. It may also be helpful to have an example for each of the 4 combinations, just as an illustration of your definitions. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-2786860650 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2055783250 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2055785871 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2055790912 From lucy at openjdk.org Wed Apr 23 11:06:47 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 23 Apr 2025 11:06:47 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation LGTM. Thank you for all the code tweaks. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24480#pullrequestreview-2786899345 From epeter at openjdk.org Wed Apr 23 11:11:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 11:11:48 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v2] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 09:53:14 GMT, Marc Chevalier wrote: >> The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: >> >> - Test, TestSimple: >> Disappeared with: >> [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 >> which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. >> Reverting this fix makes the issue reappear. >> - Reduced2: I fix here >> - Test3, Reduced3: >> Disappeared with: >> [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 >> which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) >> >> The issue comes from the fact that `And[IL]Node::Value` has a special handling when >> an operand is a left-shift: in the expression >> >> lhs & (X << s) >> >> if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling >> also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, >> updating the Shift node during IGVN won't enqueue directly the And node, but only the >> Conv node. If this conv node cannot be improved, the And node is not enqueued, and its >> type is not as good as it could be. >> >> Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a >> dead branch, so the node is about to be eleminated. On the second figure, we can see >> `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens >> during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. >> >> >> >> >> >> The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN >> to make sure it has a chance to be refined. >> >> The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes >> also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and >> the `LShiftINode`. The fix has to take that into account. >> >> >> >> >> Overall, the situation can be of the form: >> >> LShift -> C... > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Add reproducer for other case > - Add other run in tests @marc-chevalier thanks for the updates, looks good to me :) test/hotspot/jtreg/compiler/c2/gvn/MissedOptCastII.java line 26: > 24: /* > 25: * @test > 26: * @bug 8319372 I would have just put both but ids here. Hope that is correct? Suggestion: * @bug 8319372 8320909 ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24792#pullrequestreview-2786908463 PR Review Comment: https://git.openjdk.org/jdk/pull/24792#discussion_r2055813474 From mchevalier at openjdk.org Wed Apr 23 11:20:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 11:20:48 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v3] In-Reply-To: References: Message-ID: > The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: > > - Test, TestSimple: > Disappeared with: > [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 > which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. > Reverting this fix makes the issue reappear. > - Reduced2: I fix here > - Test3, Reduced3: > Disappeared with: > [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 > which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) > > The issue comes from the fact that `And[IL]Node::Value` has a special handling when > an operand is a left-shift: in the expression > > lhs & (X << s) > > if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling > also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, > updating the Shift node during IGVN won't enqueue directly the And node, but only the > Conv node. If this conv node cannot be improved, the And node is not enqueued, and its > type is not as good as it could be. > > Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a > dead branch, so the node is about to be eleminated. On the second figure, we can see > `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens > during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. > > > > > > The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN > to make sure it has a chance to be refined. > > The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes > also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and > the `LShiftINode`. The fix has to take that into account. > > > > > Overall, the situation can be of the form: > > LShift -> Cast+ -> ConvI2L -> Cast+ -> And > > This second case was shadowed by [JDK-8347459](https://bugs... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Add the other id to @bug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24792/files - new: https://git.openjdk.org/jdk/pull/24792/files/fdfb4fc0..d99bfd11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24792&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24792&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24792/head:pull/24792 PR: https://git.openjdk.org/jdk/pull/24792 From mchevalier at openjdk.org Wed Apr 23 11:20:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 11:20:50 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v2] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 11:07:14 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add reproducer for other case >> - Add other run in tests > > test/hotspot/jtreg/compiler/c2/gvn/MissedOptCastII.java line 26: > >> 24: /* >> 25: * @test >> 26: * @bug 8319372 > > I would have just put both but ids here. Hope that is correct? > Suggestion: > > * @bug 8319372 8320909 Indeed, I found examples of that by grep. I had no idea that was possible! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24792#discussion_r2055828020 From mchevalier at openjdk.org Wed Apr 23 11:22:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 11:22:45 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 09:49:40 GMT, Marc Chevalier wrote: >> The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: >> >> - Test, TestSimple: >> Disappeared with: >> [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 >> which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. >> Reverting this fix makes the issue reappear. >> - Reduced2: I fix here >> - Test3, Reduced3: >> Disappeared with: >> [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 >> which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) >> >> The issue comes from the fact that `And[IL]Node::Value` has a special handling when >> an operand is a left-shift: in the expression >> >> lhs & (X << s) >> >> if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling >> also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, >> updating the Shift node during IGVN won't enqueue directly the And node, but only the >> Conv node. If this conv node cannot be improved, the And node is not enqueued, and its >> type is not as good as it could be. >> >> Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a >> dead branch, so the node is about to be eleminated. On the second figure, we can see >> `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens >> during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. >> >> >> >> >> >> The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN >> to make sure it has a chance to be refined. >> >> The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes >> also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and >> the `LShiftINode`. The fix has to take that into account. >> >> >> >> >> Overall, the situation can be of the form: >> >> LShift -> C... > > I've added the other (not-anymore-)reproducer, tried to make clear where it comes from but where it was fixed. I've renamed the issue, added flag-free runs (because actually no flags are really needed for these). I think the PR is ready for more review. > @marc-chevalier FYI: https://github.com/openjdk/jdk/pull/17508 this could be an alternative solution, at least once we fully follow that road. It tracks the "known bits", which would give you the "modulo information". Yes, exactly! A bitwise domain is exactly what we could use here. It'd give limited modulo/range information, but with bitwise operations, we care about bits. Godspeed to that! It's nice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2823956171 From epeter at openjdk.org Wed Apr 23 11:25:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 11:25:43 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v3] In-Reply-To: References: Message-ID: <038y2_z0Dgc5aZv8ekaAh70rEfeDBW1cxEOKEBt1Ob4=.28bca279-f619-4626-8b2d-0dce0b28c425@github.com> On Wed, 23 Apr 2025 11:20:48 GMT, Marc Chevalier wrote: >> The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: >> >> - Test, TestSimple: >> Disappeared with: >> [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 >> which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. >> Reverting this fix makes the issue reappear. >> - Reduced2: I fix here >> - Test3, Reduced3: >> Disappeared with: >> [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 >> which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) >> >> The issue comes from the fact that `And[IL]Node::Value` has a special handling when >> an operand is a left-shift: in the expression >> >> lhs & (X << s) >> >> if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling >> also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, >> updating the Shift node during IGVN won't enqueue directly the And node, but only the >> Conv node. If this conv node cannot be improved, the And node is not enqueued, and its >> type is not as good as it could be. >> >> Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a >> dead branch, so the node is about to be eleminated. On the second figure, we can see >> `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens >> during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. >> >> >> >> >> >> The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN >> to make sure it has a chance to be refined. >> >> The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes >> also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and >> the `LShiftINode`. The fix has to take that into account. >> >> >> >> >> Overall, the situation can be of the form: >> >> LShift -> C... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Add the other id to @bug Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24792#pullrequestreview-2786947800 From epeter at openjdk.org Wed Apr 23 11:30:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 11:30:42 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resoultions @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? // For upper bound estimation of result value range with a constant input we // pessimistically pick max_int value to prevent incorrect constant folding // in case input equals above estimated lower bound. hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? src/hotspot/share/opto/intrinsicnode.cpp line 299: > 297: // pessimistically pick max_int value to prevent incorrect constant folding > 298: // in case input equals above estimated lower bound. > 299: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); I still don't understand this case. Where would the constant folding happen exactly if this was not here? ------------- PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2786954830 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2055841706 From shade at openjdk.org Wed Apr 23 11:32:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:32:07 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v3] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Touchups - Renames - Fully encapsulate Method* - Merge branch 'master' into JDK-8231269-compile-task-weaks - Shared utility class for method unload blocking - Merge branch 'master' into JDK-8231269-compile-task-weaks - JNIHandles -> VM(Weak) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/d965fef3..7f32b31b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=01-02 Stats: 272647 lines in 2272 files changed: 72121 ins; 193069 del; 7457 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Wed Apr 23 11:32:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:32:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Mon, 31 Mar 2025 18:46:53 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Shared utility class for method unload blocking > - Merge branch 'master' into JDK-8231269-compile-task-weaks > - JNIHandles -> VM(Weak) Pushed the `Method*` encapsulation. SA needs fixes now, but I'll test how well this works on other tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2823977158 From shade at openjdk.org Wed Apr 23 11:32:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:32:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Mon, 31 Mar 2025 23:40:09 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Shared utility class for method unload blocking >> - Merge branch 'master' into JDK-8231269-compile-task-weaks >> - JNIHandles -> VM(Weak) > > src/hotspot/share/runtime/methodUnloadBlocker.inline.hpp line 72: > >> 70: assert(!is_unloaded(), "Pre-condition: should not be unloaded"); >> 71: >> 72: if (!_weak_handle.is_empty()) { > > Does the precondition imply that `!_weak_handle.is_empty()` always hold? Not really. The precondition is: !is_unloaded() -> !(!_weak_handle.is_empty() && _weak_handle.peek() == nullptr) -> (_weak_handle.is_empty() || _weak_handle.peek() != nullptr) So you see, there is a case when weak handle is empty. It is when `_method` is `nullptr` (default initialized, no method is set), or its unload blocker is `nullptr` (method would be unloaded). Then we bypass the majority of weak->strong dance as unnecessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2055846204 From duke at openjdk.org Wed Apr 23 11:43:44 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 23 Apr 2025 11:43:44 GMT Subject: RFR: 8354520: IGV: dump contextual information In-Reply-To: References: Message-ID: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> On Thu, 17 Apr 2025 13:05:03 GMT, Roberto Casta?eda Lozano wrote: > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produce and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. Thank you for working on this and continually improving IGV! I found two typos, but otherwise it looks good. I also tested printing to IGV from `rr` and the new features work as advertised. Perhaps you could update the `dump_bfs` help with a comment about the new arguments? src/utils/IdealGraphVisualizer/README.md line 57: > 55: The JVM provides some entry functions to dump graphs from a debugger such as > 56: `gdb` or `rr`, see the different variants of `igv_print` and `igv_append` in > 57: `compile.cpp`. In combination with the IGV network interface, these functions Suggestion: [`compile.cpp`](src/hotspot/share/opto/compile.cpp). In combination with the IGV network interface, these functions src/utils/IdealGraphVisualizer/README.md line 79: > 77: ``` > 78: > 79: Another way to dump graphs interactively is through the `Node::dump_dfs` Suggestion: Another way to dump graphs interactively is through the `Node::dump_bfs` ------------- Marked as reviewed by mhaessig at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24724#pullrequestreview-2786865298 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2055786135 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2055789155 From shade at openjdk.org Wed Apr 23 11:48:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:48:59 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v4] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Purge extra fluff ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/7f32b31b..2ec579ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=02-03 Stats: 7 lines in 2 files changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From jbhateja at openjdk.org Wed Apr 23 11:59:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 11:59:57 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: <-ICBFkGuSGK0-ZPneJvBE-GdrN4B8RJnAlA1_iQ707M=.24b1c5bc-1cbd-49f4-abbc-a0eddc86469d@github.com> On Fri, 18 Apr 2025 01:36:10 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004... src/hotspot/share/opto/vectornode.cpp line 2243: > 2241: in1 = in1->in(1); > 2242: } > 2243: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks for a specific graph palette and replaces it with a new node whose inputs are the same as the inputs of the palette. GVN will do the retention job if any intermediate node has users beyond the pattern being replaced. test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38: > 36: * @summary test combining vector not operation with compare > 37: * @modules jdk.incubator.vector > 38: * @requires ((os.arch!="x86" & os.arch!="i386" & os.arch!="amd64" & os.arch!="x86_64") | vm.cpu.features ~= ".*avx.*") You can remove this platform limitation and forward the constraints to @IR rules using applyIfCPUFeatureOr ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055741909 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055787414 From mchevalier at openjdk.org Wed Apr 23 12:02:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 23 Apr 2025 12:02:50 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' Message-ID: The double `(double)count * prof_factor * method_life / counter_life + 0.5` can overflow a 32-bit int, causing UB on casting, but in practice computing a wrong scale, probably. We just need to compare that the cast is not going to overflow. This is possible because `INT_MAX` is exactly representable in a `double`. It is also good to notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` cannot overflow a `double`: - `count` is a int, max value = 2^31-1 < 2.2e9 - `method_lie` is a int, max value < 2.2e9 - `prof_factor` is a float, max value < 3.5e38 - `counter_life` is a int, positive at this point, so min value = 1 So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the max value of a double (about 1.8e308). We probably would have precision issues, but it probably doesn't matter a lot. The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. Thanks, Marc ------------- Commit messages: - Check for overflow Changes: https://git.openjdk.org/jdk/pull/24824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352422 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24824/head:pull/24824 PR: https://git.openjdk.org/jdk/pull/24824 From jbhateja at openjdk.org Wed Apr 23 12:16:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 12:16:48 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 01:36:10 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004... src/hotspot/share/opto/vectornode.cpp line 2234: > 2232: // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast to in1. > 2233: if (in1->Opcode() != Op_VectorMaskCmp && in1->Opcode() != Op_VectorMaskCast) { > 2234: swap(in1, in2); Swapping inputs like this without refreshing GVN bookkeeping is not safe. I guess you wanted to use Node::swap_edges. src/hotspot/share/opto/vectornode.cpp line 2265: > 2263: vmcmp = new VectorMaskCastNode(phase->transform(vmcmp), vmcast_vt); > 2264: } > 2265: return vmcmp; It would be preferable if you could kindly re-factor the code such that we only call VectorNode::Ideal once at return to comply with aesthetics of other idealization routines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055908691 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055911839 From amitkumar at openjdk.org Wed Apr 23 12:34:56 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 23 Apr 2025 12:34:56 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). s390x looks good and tier1 passed with fastdebug vm ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/24549#pullrequestreview-2787126642 From dlunden at openjdk.org Wed Apr 23 12:36:59 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 23 Apr 2025 12:36:59 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 17:09:13 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > src/hotspot/share/opto/chaitin.cpp line 1648: > >> 1646: // If we fail to color and the AllStack flag is set, trigger >> 1647: // a chunk-rollover event >> 1648: if (!OptoReg::is_valid(reg) && is_allstack) { > > Control question: here we seem to have passed `-chunk`. And we used to check `chunk != 0` in lots of places. What was the significance to negative chunk? Did you think about that? Yes, I've considered this at length! Keeping track of all these `chunk` additions and subtractions as we did previously would have been infeasible with my changes, as the chunk size becomes dynamic (rather than statically known). I therefore opted to move all the "chunk" logic internally in the register mask data structure. That's all the offset stuff that you've likely seen already in `regmask.hpp`. Originally for `OptoReg`s in `Select`, we had to keep track of in which "space" the register was in: the space offset by `chunk` or the non-offset space. The non-offset space is what we needed if we actually wanted to index the register mask, but it is not always the "real" space (as the register mask may represent a `chunk`-offsetted view of the register space). If you look in `find_first_set` and `find_first_elem`, you see that they return `OptoReg::Bad` to indicate that a chunk change is needed. However, in `bias_color`, we returned these values added together with `chunk`. That's why we needed to apply `-chunk` at the location you refer to, to figure out if the actual value returned was `OptoReg::Bad` ?. After my changes, all references to `OptoReg`s in `Select` (and methods called from there) are to the real, offsetted space (i.e., the actual registers). Indexing register masks with these registers is now handled internally in register masks, so we do not require any external chunk arithmetic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055950302 From epeter at openjdk.org Wed Apr 23 12:38:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 12:38:59 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 10 Apr 2025 15:56:25 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > TestIterativeEA fix @rwestrel Thanks for looking into that. I vaguely remember working on https://github.com/openjdk/jdk/pull/18265 and being quite confused about the way we treat memory slices ? I think this seems like a good step in the right direction. Though I am a little unsure about the effect on `proj_out_or_null`, as you can see below. Maybe we can have a simple query that just checks `has_proj(TypeFunc::Memory)`? But there are also some cases where you actually use the projection, and I'm not sure if that means you might be missing some if there are multiple? Like @merykitty I wonder if there are other cases where we have similar issues with memory slices. But I guess we should tackle those separately. src/hotspot/share/opto/escape.cpp line 4123: > 4121: result = result->in(MemNode::Memory); > 4122: } > 4123: if (!is_instance && result->Opcode() == Op_NarrowMemProj) { Seems you are checking for `NarrowMemProj` and casting to it in multiple places. Why not enable the macro to do `is_...` and `as_...`? src/hotspot/share/opto/escape.cpp line 4126: > 4124: // Memory for non known instance can safely skip over a known instance allocation (that memory state doesn't access > 4125: // the result of an allocation for a known instance). > 4126: assert(result->as_Proj()->_con == TypeFunc::Memory, "a NarrowMemProj can only be a memory projection"); Can we verify that already in the `NarrowMemProj`, i.e. its constructor? src/hotspot/share/opto/escape.cpp line 4128: > 4126: assert(result->as_Proj()->_con == TypeFunc::Memory, "a NarrowMemProj can only be a memory projection"); > 4127: assert(toop != nullptr, ""); > 4128: Node *proj_in = result->in(0); Suggestion: Node* proj_in = result->in(0); src/hotspot/share/opto/escape.cpp line 4460: > 4458: for (DUIterator_Fast imax, i = init->fast_outs(imax); i < imax; i++) { > 4459: ProjNode* proj = init->fast_out(i)->as_Proj(); > 4460: if (proj->Opcode() == Op_NarrowMemProj) { Suggestion: ProjNode* proj = init->fast_out(i)->as_NarrowMemProj(); if (proj != nullptr) { That way you don't need to do the hacky cast below `((NarrowMemProjNode*)proj)`. src/hotspot/share/opto/escape.cpp line 4465: > 4463: if (adr_type != new_adr_type) { > 4464: uint alias_ix = _compile->get_alias_index(new_adr_type); > 4465: assert(_compile->get_general_index(alias_ix) == _compile->get_alias_index(adr_type), "new adr type should be narrowed down from existing adr type"); Suggestion: DEBUG_ONLY( uint alias_idx = _compile->get_alias_index(new_adr_type); ) assert(_compile->get_general_index(alias_idx) == _compile->get_alias_index(adr_type), "new adr type should be narrowed down from existing adr type"); src/hotspot/share/opto/escape.cpp line 4469: > 4467: ((NarrowMemProjNode*)proj)->set_adr_type(new_adr_type); > 4468: igvn->hash_insert(proj); > 4469: record_for_optimizer(proj); That seems to be the only reason why we need the `_adr_type` non constant. How bad would it be if we just created a new node? Hmm maybe not worth it. Just wanted to know if you had considered it? src/hotspot/share/opto/escape.cpp line 4742: > 4740: if (n == nullptr) { > 4741: continue; > 4742: } Could we now have multiple `NarrowMemProj`? If so, what would happen here? src/hotspot/share/opto/escape.cpp line 4878: > 4876: if (mem->Opcode() == Op_NarrowMemProj) { > 4877: const TypePtr* at = mem->adr_type(); > 4878: uint idx = (uint) _compile->get_alias_index(at->is_ptr()); Suggestion: uint alias_idx = (uint) _compile->get_alias_index(at->is_ptr()); for consistency with the other code src/hotspot/share/opto/escape.cpp line 4885: > 4883: } > 4884: } else { > 4885: // projection for a known allocation on a non known allocation slice: skip over the allocation The thing about "non known" sounds like we should not be able to skip it... But I guess we still know they are from unrelated slices somehow? src/hotspot/share/opto/graphKit.cpp line 3639: > 3637: set_memory(_gvn.transform(new NarrowMemProjNode(init, TypeFunc::Memory, C->get_adr_type(mark_idx))), mark_idx); > 3638: int klass_idx = C->get_alias_index(oop_type->add_offset(oopDesc::klass_offset_in_bytes())); > 3639: set_memory(_gvn.transform(new NarrowMemProjNode(init, TypeFunc::Memory, C->get_adr_type(klass_idx))), klass_idx); Hmm, so we now do have multiple projections for `TypeFunc::Memory`. That makes me a little nervous about `proj_out_or_null(TypeFunc::Memory)` elsewhere in your changes. src/hotspot/share/opto/macro.cpp line 1022: > 1020: #ifdef ASSERT > 1021: Node* mem_proj = init->proj_out_or_null(TypeFunc::Memory); > 1022: if (mem_proj != nullptr) { What happens if there are multiple? src/hotspot/share/opto/memnode.cpp line 5471: > 5469: } > 5470: } > 5471: } Can you please add a comment to the code why we need both variants? src/hotspot/share/opto/multnode.hpp line 117: > 115: return ProjNode::hash() + _adr_type->hash(); > 116: } > 117: virtual bool cmp(const Node &n) const { Suggestion: virtual bool cmp(const Node& n) const { src/hotspot/share/opto/multnode.hpp line 124: > 122: } > 123: public: > 124: NarrowMemProjNode(Node *src, uint con, const TypePtr* adr_type) Suggestion: NarrowMemProjNode(Node* src, uint con, const TypePtr* adr_type) src/hotspot/share/opto/multnode.hpp line 135: > 133: } > 134: virtual int Opcode() const; > 135: }; Would it make sense to have these overridden? Just so you can print the `_adr_type` :) #ifndef PRODUCT virtual void dump_spec(outputStream *st) const; virtual void dump_compact_spec(outputStream *st) const; #endif test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 26: > 24: /* > 25: * @test > 26: * @bug 8327012 Suggestion: * @bug 8327012 8327963 I would add the current bug id as well. test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 39: > 37: * @run main/othervm -Xcomp > 38: * -XX:CompileCommand=compileonly,compiler.macronodes.TestEliminationOfAllocationWithoutUse::test* > 39: * compiler.macronodes.TestEliminationOfAllocationWithoutUse Would a run without Xcomp make sense? test/hotspot/jtreg/compiler/macronodes/TestInitializingStoreCapturing.java line 26: > 24: /* > 25: * @test > 26: * @bug 8327012 Suggestion: * @bug 8327012 8327963 ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-2787011657 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055891189 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055892604 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055893670 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055900151 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055906047 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055909093 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055911598 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055914288 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055919288 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055923553 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055926893 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055933278 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055934726 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055875634 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055878130 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055939717 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055940393 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055941586 From epeter at openjdk.org Wed Apr 23 12:38:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 12:38:59 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 14 Apr 2025 15:35:17 GMT, Roland Westrelin wrote: >> It would be great if we have union memory slices for this. > >> It would be great if we have union memory slices for this. > > Something like that would fix it but it would be trickier to get right that this point fix, I think. Do you see any other use for it? @rwestrel You should update the bug title, it just sounds too generic. It is really helpful when the "blame" history gives you a helpful comment rather than `Umbrella` or `incorrect result` or some bug numbers. I suggest `Use NarrowMemProj to project / split memory slice after Initialize`, but you probably have an even better idea ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2824145457 From epeter at openjdk.org Wed Apr 23 12:38:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 12:38:59 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 23 Apr 2025 12:13:08 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> TestIterativeEA fix > > src/hotspot/share/opto/escape.cpp line 4878: > >> 4876: if (mem->Opcode() == Op_NarrowMemProj) { >> 4877: const TypePtr* at = mem->adr_type(); >> 4878: uint idx = (uint) _compile->get_alias_index(at->is_ptr()); > > Suggestion: > > uint alias_idx = (uint) _compile->get_alias_index(at->is_ptr()); > > for consistency with the other code Well, actually it is already being used inconsistently... hmm. More expressive may still be better. Up to you. > src/hotspot/share/opto/multnode.hpp line 117: > >> 115: return ProjNode::hash() + _adr_type->hash(); >> 116: } >> 117: virtual bool cmp(const Node &n) const { > > Suggestion: > > virtual bool cmp(const Node& n) const { Subjective, up to you > src/hotspot/share/opto/multnode.hpp line 135: > >> 133: } >> 134: virtual int Opcode() const; >> 135: }; > > Would it make sense to have these overridden? Just so you can print the `_adr_type` :) > > #ifndef PRODUCT > virtual void dump_spec(outputStream *st) const; > virtual void dump_compact_spec(outputStream *st) const; > #endif Ah, or does it already get printed from the `adr_type()` i.e. the virtual method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055915472 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055935253 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055936793 From epeter at openjdk.org Wed Apr 23 12:38:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 12:38:59 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 23 Apr 2025 12:26:43 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/multnode.hpp line 135: >> >>> 133: } >>> 134: virtual int Opcode() const; >>> 135: }; >> >> Would it make sense to have these overridden? Just so you can print the `_adr_type` :) >> >> #ifndef PRODUCT >> virtual void dump_spec(outputStream *st) const; >> virtual void dump_compact_spec(outputStream *st) const; >> #endif > > Ah, or does it already get printed from the `adr_type()` i.e. the virtual method? Hmm, no I don't think so... right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2055938496 From epeter at openjdk.org Wed Apr 23 13:04:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 13:04:48 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 01:42:22 GMT, Xiaohong Gong wrote: >> ### Summary: >> [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. >> >> ### Background: >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). >> >> The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. >> >> Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: >> >> 1. `MaxVectorSize = 16, byte_vector_size = 16`: >> - Can load 4 indices per vector register >> - So can finish 4 bytes per gather-load operation >> - Requires 4 times of gather-loads and final merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] >> int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 4 gather-load: >> idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] >> idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] >> idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] >> idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] >> merge: v = [jlkp bhga cfhf becd] >> ``` >> >> 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: >> - Can load 8 indices per vector register >> - So can finish 8 bytes per gather-load operation >> - Requires 2 times of gather-loads and merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] >> int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 2 gather-load: >> idx_v1 = [2 5 7 5 1 4 2 3] >> idx_v2 = [9 11 10 15 1 7 6 0] >> gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] >> gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] >> merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] >> ``` >> >> 3. `MaxVectorSize = 64, byte_v... > > Hi @jatin-bhateja , could you please help take a look at this PR especially the X86 part? Thanks a lot! > Hi @RealFYang , could you please help review the RVV part? Thanks a lot! @XiaohongGong I had a quick look at your changes and PR description. I wonder if you could split some of the refactoring into a separate PR? That would make it easier to review. Currently, you basically have x64 changes, aarch64 changes, Java library changes, and C2 changes. That's a lot at once. And it would basically require the review from a lot of different people at once. Splitting would make it easier to review, less work for the reviewer. It would ensure everybody can look at a smaller change set, and that would also increase the quality of the code after review, I think. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24679#issuecomment-2824229233 From rgiulietti at openjdk.org Wed Apr 23 13:13:48 2025 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Wed, 23 Apr 2025 13:13:48 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v4] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 21:57:19 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Shorter first sentence src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 70: > 68: * accessed by the current thread: shared fields must be read into local > 69: * variables, and shared arrays must be copied to an exclusive copy, to ensure > 70: * each shared location (a field or an array component) is accessed exactly once. Suggestion: * each shared location (a field or an array component) is accessed at most once. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2056018522 From epeter at openjdk.org Wed Apr 23 13:20:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Apr 2025 13:20:51 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v7] In-Reply-To: References: Message-ID: <2C7Wv2ImcnJRliaT6QfA1ijlT65W9eYh9t5_lSdBUBw=.ca592318-86ca-40fe-bd67-3426e88a0286@github.com> On Tue, 22 Apr 2025 13:28:58 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Test with different phase Looks reasonable to me too :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24479#pullrequestreview-2787281758 From liach at openjdk.org Wed Apr 23 14:12:29 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 23 Apr 2025 14:12:29 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v5] In-Reply-To: References: Message-ID: <4cfCydBqbJcxVuRmEf7W4ehFuz_MMf_zc4k5dxpQoCU=.cc4aaf74-5a6a-4213-857b-6b5f69fb63d1@github.com> > In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java Co-authored-by: Raffaello Giulietti ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24777/files - new: https://git.openjdk.org/jdk/pull/24777/files/d7b652e3..24ed1cc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777 PR: https://git.openjdk.org/jdk/pull/24777 From erikj at openjdk.org Wed Apr 23 14:26:57 2025 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 23 Apr 2025 14:26:57 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 23:05:13 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Fix message Build change looks trivially good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2824495947 From vpaprotski at openjdk.org Wed Apr 23 16:24:40 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 23 Apr 2025 16:24:40 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 10:37:17 GMT, Emanuel Peter wrote: > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? @eme64 Thanks for looking. Point form in attempt to be concise: - Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) - The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). - (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2824849699 From duke at openjdk.org Wed Apr 23 16:30:49 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 23 Apr 2025 16:30:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v7] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new input values are provided for the existing micro-benchmark and a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over the baseline. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios. > > | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | > | :-------------------: | :----------------: | :----------------: | :------------------------: | > | [-1, 1] | 22671 | 22190 | -2.12 | > | [-2, 2] | 22680 | 22191 | -2.16 | > | [-10, 10] | 22683 | 22149 | -2.35 | > | [-20, 20] | 22694 | 22183 | -2.25 | > | [-100, 100] | 29806 | 33675 | +12.98 | > | [-1000, 1000] | 46747 | 49179 | +5.20 | > | [-10000, 10000] | 50579 | 52070 ... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Make regular tanh benchmark inputs constant values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/cd6e1bab..64abe5f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=05-06 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From duke at openjdk.org Wed Apr 23 16:30:51 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 23 Apr 2025 16:30:51 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v6] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 09:08:54 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Restructure tanh micro-benchmarks > > test/micro/org/openjdk/bench/java/lang/MathBench.java line 68: > >> 66: >> 67: @Param({"-2.0", "-1.0", "-0.5", "-0.1", "0.0", "0.1", "0.5", "1.0", "2.0"}) >> 68: public double tanhConstInput; > > The field is not a static final. So inputs to tanh will not be constant values. Thanks for pointing this out. They should now be constant values. If there are constants required, please let me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2056436474 From kxu at openjdk.org Wed Apr 23 16:57:54 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 23 Apr 2025 16:57:54 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:48:45 GMT, Emanuel Peter wrote: >> @eme64 Could you please take a look at this if you have some time? Thanks! > > @tabjy Do you want us to review again? We generally wait for a ping, otherwise we assume you are still working on it ;) Hello @eme64. I pinged you in [an in-line review](https://github.com/openjdk/jdk/pull/23506#discussion_r2042974649). Could you please provide some commons on this assertion? This is currently blocking my progress and breaking the build. Thank you very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2824943696 From dhanalla at openjdk.org Wed Apr 23 17:03:57 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 23 Apr 2025 17:03:57 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: <5JpWWMlRP-o60KZI9bU5bMq-dJePHvnKdUgigCfwbfo=.c5545951-3f36-43da-b082-79a3a00ac6c0@github.com> On Tue, 22 Apr 2025 14:49:39 GMT, Dhamoder Nalla wrote: >>> @dhanalla Are you still making changes or is this ready to review? (if not ready just make it a draft ;) ) >> >> @eme64, This is ready for review. > >> @dhanalla Do you want us to continue reviewing? It is usually good to ping people again after making changes. Otherwise, we don't know if you are still working on it and we should wait. > @eme64 yes, please review. > @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved ? each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2824957042 From dhanalla at openjdk.org Wed Apr 23 17:03:57 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 23 Apr 2025 17:03:57 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: <5JpWWMlRP-o60KZI9bU5bMq-dJePHvnKdUgigCfwbfo=.c5545951-3f36-43da-b082-79a3a00ac6c0@github.com> References: <5JpWWMlRP-o60KZI9bU5bMq-dJePHvnKdUgigCfwbfo=.c5545951-3f36-43da-b082-79a3a00ac6c0@github.com> Message-ID: On Wed, 23 Apr 2025 17:00:03 GMT, Dhamoder Nalla wrote: > > @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. > > You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved ? each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA. @eme64 If this answers your question, this PR is ready for review ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2824960375 From kvn at openjdk.org Wed Apr 23 17:17:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Apr 2025 17:17:43 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 14:24:20 GMT, Erik Joelsson wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > Build change looks trivially good. Thank you, @erikj79 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2824992535 From shade at openjdk.org Wed Apr 23 17:26:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:26:36 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v5] In-Reply-To: References: Message-ID: <9ZFEqmXrFwO-bYV3AC8JAg_B8f0HGDzzKLoMH2z9CAI=.1f4de885-d63a-41e8-a02e-2779007777ca@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix VMStructs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/2ec579ca..63650fab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=03-04 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Wed Apr 23 17:26:37 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:26:37 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Wed, 23 Apr 2025 11:29:34 GMT, Aleksey Shipilev wrote: > SA needs fixes now, but I'll test how well this works on other tests. Actually, I can just purge `CompileTask.java`: https://github.com/openjdk/jdk/pull/24832 I see that async-profiler uses the `CompileTask::_method` field directly, I think to see what compiler threads are up to. So it needs to be fixed after this PR lands, and it would dereference through the newly added handle. Luckily, I think that access only happens when compilation is already running, and `method*` is guaranteed to be alive. Paging @apangin for visibility. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2825010041 From shade at openjdk.org Wed Apr 23 17:31:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:31:07 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Allow UMH::_method access from VMStructs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/63650fab..91d38ff1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=04-05 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Wed Apr 23 17:33:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:33:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:31:07 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Allow UMH::_method access from VMStructs I re-ran testing, and it looks green. So we can start polishing this thing for eventual integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2825030285 From cjplummer at openjdk.org Wed Apr 23 18:03:43 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 23 Apr 2025 18:03:43 GMT Subject: RFR: 8355432: Remove CompileTask from SA In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:14:17 GMT, Aleksey Shipilev wrote: > A lot of SA infrastructure was added to support SA compiler replay with [JDK-7088955](https://bugs.openjdk.org/browse/JDK-7088955). With [JDK-8315488](https://bugs.openjdk.org/browse/JDK-8315488), we got rid from the most of it. `CompileTask` seems to be left behind. Nothing uses it in SA now. > > Now, for Leyden, we want to massage `CompileTask` for better performance and reliability ([JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269)), and keeping `CompileTask` in SA would require us to implement a whole bunch of complicated, but unnecessary code. > > So, it would be good to purge `CompileTask` from SA. > > Note that I left the related `vmStructs` definitions, because async-profiler uses those; I think to see which methods current compiler is compiling. That use looks safe, as it polls the task from the already set up ciEnv. async-profiler would need to re-adjust after [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269) makes relevant changes in `vmStructs`. This PR frees us from doing the same thing in SA. Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24832#pullrequestreview-2788288929 From lmesnik at openjdk.org Wed Apr 23 18:07:55 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 23 Apr 2025 18:07:55 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc The tier3_compiler includes all tests that are not in tier1 and tier2. The goal is to keep tier1 minimal. Tier1 shouldn't include long tests. Can you please confirm that newly added test groups don't have tests with custom timeouts and overall time hasn't increased significantly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2825109539 From lmesnik at openjdk.org Wed Apr 23 18:13:46 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 23 Apr 2025 18:13:46 GMT Subject: RFR: 8355432: Remove CompileTask from SA In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:14:17 GMT, Aleksey Shipilev wrote: > A lot of SA infrastructure was added to support SA compiler replay with [JDK-7088955](https://bugs.openjdk.org/browse/JDK-7088955). With [JDK-8315488](https://bugs.openjdk.org/browse/JDK-8315488), we got rid from the most of it. `CompileTask` seems to be left behind. Nothing uses it in SA now. > > Now, for Leyden, we want to massage `CompileTask` for better performance and reliability ([JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269)), and keeping `CompileTask` in SA would require us to implement a whole bunch of complicated, but unnecessary code. > > So, it would be good to purge `CompileTask` from SA. > > Note that I left the related `vmStructs` definitions, because async-profiler uses those; I think to see which methods current compiler is compiling. That use looks safe, as it polls the task from the already set up ciEnv. async-profiler would need to re-adjust after [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269) makes relevant changes in `vmStructs`. This PR frees us from doing the same thing in SA. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24832#pullrequestreview-2788313616 From vlivanov at openjdk.org Wed Apr 23 18:15:10 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 18:15:10 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:43:47 GMT, Hamlin Li wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 288: >> >>> 286: IntFunction> implSupplier, >>> 287: V v) { >>> 288: var entry = lookup(op, opc, vspecies, implSupplier); >> >> Seems there is another issue for riscv here. >> If the rvv extension is not supported on the running machine, it will still generate the code using rvv, this should lead to a crash at runtime? > > In previous code, we use `UseRVV` to detect if rvv extension is supported. > > On the other hand, user can choose to disable UseRVV if they want even if rvv extension is supported on the running machine. In this sense, there could be similar issue on other platforms? Does the following check catch `UseRVV == false` case on RISC-V? public boolean isSupported(Operator op, VectorSpecies vspecies) { ... int maxLaneCount = VectorSupport.getMaxLaneCount(vspecies.elementType()); if (vspecies.length() > maxLaneCount) { return false; // lacking vector support } ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2056642914 From sparasa at openjdk.org Wed Apr 23 19:29:27 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 23 Apr 2025 19:29:27 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v11] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - merge master - rename demote flag to optimize_rax_dst in emit_arith - fix emit_arith_ndd discrepancy - refactor imul instructions to fold demotion logic inside - refactor APX NDD shift instrcutions to do demotion internally - RRM refactorted to use a unified evex_ndd_and_int8 function - RRM refactoring works for exorb and exorw - refactor RRM instructions to avoid explicit demotion - refactor esetzucc to do encoding without a demote flag - Disable demotion for esetzucc and cleanup code - ... and 1 more: https://git.openjdk.org/jdk/compare/2ec61f0f...a494e423 ------------- Changes: https://git.openjdk.org/jdk/pull/24431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=10 Stats: 3753 lines in 5 files changed: 1415 ins; 430 del; 1908 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From vlivanov at openjdk.org Wed Apr 23 21:41:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 21:41:53 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 17:10:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Reconstruct FP > - aarch64 support > - Merge branch 'master' into verifycast > - assert CastLL > - reviews > - make the flag diagnostic > - Merge branch 'master' into verifycast > - draft > - Merge branch 'master' into verifycast > - Merge branch 'master' into verifycast > - ... and 6 more: https://git.openjdk.org/jdk/compare/66aec1b1...8d140fd9 Looks good. Thanks. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2763: > 2761: > 2762: if (lo != min_jint && hi != max_jint) { > 2763: subsw(rtmp, rval, lo); It turns out it's equivalent to `cmpw(rval, lo)` which is clearer IMO. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22880#pullrequestreview-2788341212 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2056658683 From vlivanov at openjdk.org Wed Apr 23 23:54:01 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 23:54:01 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v13] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: CPUFeatures: RISC-V support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/42ed9baa..585312ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=11-12 Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Wed Apr 23 23:54:02 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 23:54:02 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v10] In-Reply-To: <4LfzOzPmZZ42Aoub8uPvjvAvUjD29tr54PyNQMGSEEM=.fd73d966-b7cc-4695-a79f-fd3c62467ed0@github.com> References: <4c9BdDJr1BWA92Uv7HHIa6_ons9yzW0U6q0uOzKLxPY=.102b7ec5-c4bb-4117-a092-0815d01aa74d@github.com> <4LfzOzPmZZ42Aoub8uPvjvAvUjD29tr54PyNQMGSEEM=.fd73d966-b7cc-4695-a79f-fd3c62467ed0@github.com> Message-ID: <2DFsGhESxoFOqtDvZhoA24gyQkTu1xU_MgKCxdkCd2w=.49767c67-7ca4-40e1-8c0c-f7a4ce44f2e0@github.com> On Wed, 23 Apr 2025 08:54:23 GMT, Hamlin Li wrote: >> The intention is to align `_features_string` with `_features` which enumerates well-known CPU capabilities JVM manages. As of now, `_features_string` contains more information, so I introduced `_cpu_info_string` to keep it. >> >> Speaking of `test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java`, the approach chosen there may be fine for a test library, but we need a more stable API between JVM and JDK. > > I'm fine with this. > But it might be better to change the spliting regex of `_features_string` in CPUFeatures.java to support riscv cpu features format. Ok, I pushed an update. Let me know what you think about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2057061162 From vlivanov at openjdk.org Wed Apr 23 23:58:43 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 23 Apr 2025 23:58:43 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 18:11:42 GMT, Vladimir Ivanov wrote: >> In previous code, we use `UseRVV` to detect if rvv extension is supported. >> >> On the other hand, user can choose to disable UseRVV if they want even if rvv extension is supported on the running machine. In this sense, there could be similar issue on other platforms? > > Does the following check catch `UseRVV == false` case on RISC-V? > > public boolean isSupported(Operator op, VectorSpecies vspecies) { > ... > int maxLaneCount = VectorSupport.getMaxLaneCount(vspecies.elementType()); > if (vspecies.length() > maxLaneCount) { > return false; // lacking vector support > } > ... FTR both `VectorSupport.getMaxLaneCount()` and `CPUFeatures` don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by `VM_Version` and, hence, `CPUFeatures`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2057065571 From vlivanov at openjdk.org Thu Apr 24 00:39:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 00:39:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:31:07 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Allow UMH::_method access from VMStructs Nice work! src/hotspot/share/runtime/unloadableMethodHandle.hpp line 43: > 41: // 3. Final released state. Relevant Method* is in unknown state, and cannot be > 42: // accessed. > 43: // Please, elaborate what state transitions are supported. Currently, my understanding is there are 3 transitions and 4 states: * 1 -> 2 * 2 -> 3 (terminal) * 1 -> 3 (terminal) * 0 (empty, terminal) src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 26: > 24: > 25: #ifndef SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP > 26: #define SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP Stale header file name used? src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 37: > 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { > 36: _method = method; > 37: if (method != nullptr) { Is it possible to require `method` (and hence `_method`) to always be non-null? src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 57: > 55: > 56: // Null holder, the relevant class would not be unloaded. > 57: return nullptr; Is this the case of bootstrap classloader? As an optimization opportunity, it can be extended for other system loaders. src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 93: > 91: > 92: inline Method* UnloadableMethodHandle::method() const { > 93: assert(!is_unloaded(), "Should not be unloaded"); Assert that `block_unloading()` was called before? ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2788983703 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057101817 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057087135 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057089091 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057091706 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057084091 From sparasa at openjdk.org Thu Apr 24 01:12:17 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 24 Apr 2025 01:12:17 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v12] In-Reply-To: References: Message-ID: <8zQAOTBGdC6jKM_NDJqXexntgO1R7EyPhmLQbMhO57E=.729b9a6b-1937-43fd-9116-7fe17e446ae7@github.com> > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: WIP: cleanup refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/a494e423..ac53cd13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=10-11 Stats: 68 lines in 2 files changed: 25 ins; 22 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From vlivanov at openjdk.org Thu Apr 24 01:34:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 01:34:49 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v4] In-Reply-To: References: Message-ID: <3DRTEheyn6n6OYx38sL8-tqQbycO-QIjfwqwlErr5TI=.6cf2043b-f7f8-4a9d-9b59-3a844e74eaf2@github.com> On Wed, 23 Apr 2025 05:47:21 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add dynamic sized feature vectors > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 > - dropping unneeded feature enabling/checks > - 8352675: Support Intel AVX10 converged vector ISA feature detection It looks much better! Thanks, Jatin. I'm curious why don't you represent feature bitmap as a POD (with all the accessors on it) and pass it around by value when needed? (It's size will vary across platforms, but will be fixed at runtime.) It should significantly simplify the implementation. As an example, take a look at `RegMask` in C2. It accommodates significantly more bits than needed for `VM_Version`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2789109933 From fyang at openjdk.org Thu Apr 24 01:39:40 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Apr 2025 01:39:40 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). RISC-V part of the change looks fine. Passed hs:tier1 test on linux-riscv64 platform. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24549#pullrequestreview-2789116914 From sparasa at openjdk.org Thu Apr 24 01:46:13 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 24 Apr 2025 01:46:13 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v13] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactoring done for ecmov RRR and others ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/ac53cd13..2768cc52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=11-12 Stats: 22 lines in 2 files changed: 3 ins; 4 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From xgong at openjdk.org Thu Apr 24 01:46:42 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 24 Apr 2025 01:46:42 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 01:42:22 GMT, Xiaohong Gong wrote: >> ### Summary: >> [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. >> >> ### Background: >> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). >> >> The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. >> >> Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: >> >> 1. `MaxVectorSize = 16, byte_vector_size = 16`: >> - Can load 4 indices per vector register >> - So can finish 4 bytes per gather-load operation >> - Requires 4 times of gather-loads and final merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] >> int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 4 gather-load: >> idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] >> idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] >> idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] >> idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] >> merge: v = [jlkp bhga cfhf becd] >> ``` >> >> 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: >> - Can load 8 indices per vector register >> - So can finish 8 bytes per gather-load operation >> - Requires 2 times of gather-loads and merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] >> int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 2 gather-load: >> idx_v1 = [2 5 7 5 1 4 2 3] >> idx_v2 = [9 11 10 15 1 7 6 0] >> gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] >> gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] >> merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] >> ``` >> >> 3. `MaxVectorSize = 64, byte_v... > > Hi @jatin-bhateja , could you please help take a look at this PR especially the X86 part? Thanks a lot! > Hi @RealFYang , could you please help review the RVV part? Thanks a lot! > @XiaohongGong I had a quick look at your changes and PR description. I wonder if you could split some of the refactoring into a separate PR? That would make it easier to review. Currently, you basically have x64 changes, aarch64 changes, Java library changes, and C2 changes. That's a lot at once. And it would basically require the review from a lot of different people at once. > > Splitting would make it easier to review, less work for the reviewer. It would ensure everybody can look at a smaller change set, and that would also increase the quality of the code after review, I think. > > What do you think? Thanks for looking at this PR @eme64 ! It's a good idea splitting this PR as smaller ones. I will consider about this. Maybe I can do a refactoring first, and then implement the compiler support for AArch64 as a followed-up PR. WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24679#issuecomment-2825949426 From vlivanov at openjdk.org Thu Apr 24 01:56:43 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 01:56:43 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 23:05:13 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Fix message Finished the first pass over the code. Overall, looks good. Some feedback follows. src/hotspot/share/cds/aotCacheAccess.hpp line 38: > 36: // AOT Cache API for AOT compiler > 37: > 38: class AOTCacheAccess : AllStatic { It looks related to `AOTCodeCache`? Maybe `AOTCodeCacheAccess` then? src/hotspot/share/cds/aotCacheAccess.hpp line 40: > 38: class AOTCacheAccess : AllStatic { > 39: public: > 40: static void* allocate_aot_code(size_t size) NOT_CDS_RETURN_(nullptr); "allocate_aot_code_region", "get_aot_code_region_size", and "map_aot_code_region" would be clearer. src/hotspot/share/code/aotCodeCache.cpp line 62: > 60: } > 61: > 62: static void exit_vm_on_store_failure() { It's a bit confusing to see `exit_vm_on_load_failure()` and `exit_vm_on_store_failure()` to silently proceed unless a flag is explicitly specified. Moreover, how reliable `AOTAdapterCaching = false` to fail-fast and avoid repreated load/store attempts? At least, I see that `AOTCodeCache` ctor cache `AOTAdapterCaching`, so it won't see the update. How does it affect adapter code generation during assembly phase? src/hotspot/share/code/aotCodeCache.cpp line 645: > 643: return false; > 644: } > 645: log_info(aot, codecache, stubs)("Writing blob '%s' to AOT Code Cache", name); I'd revisit logging code in AOTCodeCache and downgrade info->debug and debug->trace where appropriate. It feels too low-level most of the time. src/hotspot/share/runtime/sharedRuntime.cpp line 2966: > 2964: adapter_blob = AdapterHandlerLibrary::link_aot_adapter_handler(this); > 2965: if (adapter_blob == nullptr) { > 2966: log_warning(cds)("Failed to link AdapterHandlerEntry (fp=%s) to its code in the AOT code cache", _fingerprint->as_basic_args_string()); Doesn't it add noise in the output for not yet seen adapter shapes? It's a warning. ------------- PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2789025188 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2057119115 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2057115219 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2057207417 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2057187511 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2057189917 From iklam at openjdk.org Thu Apr 24 02:20:42 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 24 Apr 2025 02:20:42 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: <0hoQ2BKezGCepBeBlxwfXUHw82Vk4Oia58NHyyD3KiM=.d6bcdd70-cc42-4dca-adff-0d881e60f979@github.com> On Thu, 24 Apr 2025 00:51:31 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > src/hotspot/share/cds/aotCacheAccess.hpp line 38: > >> 36: // AOT Cache API for AOT compiler >> 37: >> 38: class AOTCacheAccess : AllStatic { > > It looks related to `AOTCodeCache`? Maybe `AOTCodeCacheAccess` then? This file is called https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/cds/cdsAccess.hpp in the Leyden repo and provides an abstract API for accessing contents of the AOT cache. In Leyden, we have APIs for accessing cached oops: static int get_archived_object_permanent_index(oop obj) NOT_CDS_JAVA_HEAP_RETURN_(-1); static oop get_archived_object(int permanent_index) NOT_CDS_JAVA_HEAP_RETURN_(nullptr); and various pointer operations static uint delta_from_shared_address_base(address addr); template static void set_pointer(T** ptr, T* value) { set_pointer((address*)ptr, (address)value); } static void set_pointer(address* ptr, address value); Let's keep the AOTCacheAccess name for now and wait until we merge this PR down to the Leyden repo. There's some overlap between AOTCacheAccess and CDSConfig. Maybe we should do a refactor/rename later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2057251128 From dlong at openjdk.org Thu Apr 24 02:23:41 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Apr 2025 02:23:41 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 10:58:54 GMT, Marc Chevalier wrote: > The double `(double)count * prof_factor * method_life / counter_life + 0.5` > can overflow a 32-bit int, causing UB on casting, but in practice computing > a wrong scale, probably. > > We just need to compare that the cast is not going to overflow. This is possible > because `INT_MAX` is exactly representable in a `double`. It is also good to > notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` > cannot overflow a `double`: > - `count` is a int, max value = 2^31-1 < 2.2e9 > - `method_lie` is a int, max value < 2.2e9 > - `prof_factor` is a float, max value < 3.5e38 > - `counter_life` is a int, positive at this point, so min value = 1 > So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the > max value of a double (about 1.8e308). We probably would have precision issues, but > it probably doesn't matter a lot. > > The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. > > Thanks, > Marc src/hotspot/share/ci/ciMethod.cpp line 919: > 917: double count_d = (double)count * prof_factor * method_life / counter_life + 0.5; > 918: if (count_d >= static_cast(INT_MAX)) { > 919: count = INT_MAX; INT_MAX is probably the best choice, but could cause a change in behavior if the compiler previously returned a negative number here on overflow. It's not clear to me if we want to preserve existing behavior or not. In various places we use different limits for saturated counters and may use a negative number to represent overflow. I don't know if this is by design or accident, but it can happen when, for example, the taken() or not_taken() value of a BranchData overflows and gets clamped to (uint)-1. In Parse::dynamic_branch_prediction() we assign those uint values to int, resulting in a negative number, which will be rejected by counters_are_meaningful(). By clamping an overflow here to INT_MAX instead of a UB negative number, we could allow taken == INT_MAX, not_taken == 0 to sneak past, which might be harmless. It's not clear to me the best way to handle saturated values, but you might find it useful looking at the history in JDK-8306331 and JDK-8306481. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24824#discussion_r2057258013 From fyang at openjdk.org Thu Apr 24 02:53:50 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Apr 2025 02:53:50 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: On Tue, 22 Apr 2025 15:09:59 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> It just enables some test to run on riscv. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > copyright Thanks for doing that. I only have one minor comment. test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Double.java line 94: > 92: @IR(applyIfPlatform = {"riscv64", "true"}, > 93: applyIf = {"SuperWordReductions", "true"}, > 94: failOn = {IRNode.MUL_REDUCTION_VF}) I think we can just leave this two test files (`ProdRed_Float.java`/`ProdRed_Double.java`) untouched? The IR check is only enabled for x86 sse2 target for now. It does not apply to other CPUs. So I don't see why RISC-V is special here. ------------- PR Review: https://git.openjdk.org/jdk/pull/24797#pullrequestreview-2789275875 PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2057301732 From duke at openjdk.org Thu Apr 24 05:13:58 2025 From: duke at openjdk.org (duke) Date: Thu, 24 Apr 2025 05:13:58 GMT Subject: Withdrawn: 8350097: Make Compilation::current() and Compile::current() safer In-Reply-To: <4ELV07PUQEFeOLgzqbV3OoGjHVny5paw0Gk0awuJ3h0=.99faedbd-4909-4d8e-93eb-75d5697e797f@github.com> References: <4ELV07PUQEFeOLgzqbV3OoGjHVny5paw0Gk0awuJ3h0=.99faedbd-4909-4d8e-93eb-75d5697e797f@github.com> Message-ID: On Fri, 14 Feb 2025 14:48:34 GMT, Thomas Stuefe wrote: > Somewhat trivial. > > I recently hunted a bug for an hour until I realized that I had accessed ciEnv::compiler_data() as C2 `Compile` when, in fact, it was C1 `Compilation`. Stupid mistake, but an assert is easy to do and saves time. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23635 From duke at openjdk.org Thu Apr 24 05:58:25 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 24 Apr 2025 05:58:25 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction Message-ID: support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 add C2 match rule add related Tests in IRNode structure passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* ------------- Commit messages: - modify format - modify v0.t to $v0 - add format fix - RISC-V: Support Zvbb Vector And-Not vx and add its tests Changes: https://git.openjdk.org/jdk/pull/24709/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355074 Stats: 293 lines in 4 files changed: 147 ins; 0 del; 146 mod Patch: https://git.openjdk.org/jdk/pull/24709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24709/head:pull/24709 PR: https://git.openjdk.org/jdk/pull/24709 From epeter at openjdk.org Thu Apr 24 06:18:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Apr 2025 06:18:02 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 13:02:31 GMT, Emanuel Peter wrote: >> Hi @jatin-bhateja , could you please help take a look at this PR especially the X86 part? Thanks a lot! >> Hi @RealFYang , could you please help review the RVV part? Thanks a lot! > > @XiaohongGong I had a quick look at your changes and PR description. I wonder if you could split some of the refactoring into a separate PR? That would make it easier to review. Currently, you basically have x64 changes, aarch64 changes, Java library changes, and C2 changes. That's a lot at once. And it would basically require the review from a lot of different people at once. > > Splitting would make it easier to review, less work for the reviewer. It would ensure everybody can look at a smaller change set, and that would also increase the quality of the code after review, I think. > > What do you think? > Thanks for looking at this PR @eme64 ! It's a good idea splitting this PR as smaller ones. I will consider about this. Maybe I can do a refactoring first, and then implement the compiler support for AArch64 as a followed-up PR. WDYT? That sounds excellent :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24679#issuecomment-2826496942 From enikitin at openjdk.org Thu Apr 24 06:27:52 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 24 Apr 2025 06:27:52 GMT Subject: RFR: 8355387: [jittester] Disable downcasts by default Message-ID: Currently, JITTester's love to downcast often produces something like this: ArrayList someVar = (TreeSet)(Object)(List)(new ArrayList()); ... which is possible because it goes up to Object and then starts downcasting to some totally unrelated class / type. Considering the JITTester's love to casts (they are more-or-less 'safe' expressions), it means a high probability (30-50%) of a gentest to fail compilation. Even worse is the situation for ByteCode tests - that they're faulty is only recognized during the run phase. I suggest to disable the downcasts for now. Testing: 50-100 generated tests in different combinations (default, with the flag set to 'false' or 'true') with artificially increased chance to casts. ------------- Commit messages: - 8355387: [jittester] Disable downcasts by default Changes: https://git.openjdk.org/jdk/pull/24840/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24840&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355387 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24840/head:pull/24840 PR: https://git.openjdk.org/jdk/pull/24840 From epeter at openjdk.org Thu Apr 24 06:29:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Apr 2025 06:29:53 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 17:10:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Reconstruct FP > - aarch64 support > - Merge branch 'master' into verifycast > - assert CastLL > - reviews > - make the flag diagnostic > - Merge branch 'master' into verifycast > - draft > - Merge branch 'master' into verifycast > - Merge branch 'master' into verifycast > - ... and 6 more: https://git.openjdk.org/jdk/compare/a581f95f...8d140fd9 Wow, this looks even much better with the improved printing on failure now! Just out of curiosity: Is the whole `reconstruct_frame_pointer` mechanism general enough so that we could use it in other places as well? It is not super important to me any more, but I've wanted to have something like this for `VerifyAlignVector` already :) test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java line 28: > 26: * @bug 8346836 > 27: * @requires vm.debug == true & vm.flavor == "server" > 28: * @summary Run with -Xcomp to test -XX:+StressGCM -XX:VerifyConstraintCasts=1 in debug builds. Nit: you are also running it with `-XX:VerifyConstraintCasts=1`. I would just keep the summary more generic. Suggestion: * @summary Empty main program to run with flag VerifyConstraintCasts. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22880#pullrequestreview-2789859060 PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2057632123 From mchevalier at openjdk.org Thu Apr 24 06:50:54 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 06:50:54 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 02:21:15 GMT, Dean Long wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > src/hotspot/share/ci/ciMethod.cpp line 919: > >> 917: double count_d = (double)count * prof_factor * method_life / counter_life + 0.5; >> 918: if (count_d >= static_cast(INT_MAX)) { >> 919: count = INT_MAX; > > INT_MAX is probably the best choice, but could cause a change in behavior if the compiler previously returned a negative number here on overflow. It's not clear to me if we want to preserve existing behavior or not. In various places we use different limits for saturated counters and may use a negative number to represent overflow. I don't know if this is by design or accident, but it can happen when, for example, the taken() or not_taken() value of a BranchData overflows and gets clamped to (uint)-1. In Parse::dynamic_branch_prediction() we assign those uint values to int, resulting in a negative number, which will be rejected by counters_are_meaningful(). > By clamping an overflow here to INT_MAX instead of a UB negative number, we could allow taken == INT_MAX, not_taken == 0 to sneak past, which might be harmless. It's not clear to me the best way to handle saturated values, but you might find it useful looking at the history in JDK-8306331 and JDK-8306481. There was still count = (count > 0) ? count : 1; so actually, values that became negative by overflow are actually pushed up to 1 (which seems to be a legitimate value to me): it was already not possible to distinguish overflows from good results. But even if I return a (new) special value for overflows, there is the question of what to do with that from the caller perspective. In my understanding, the result is used for profiling purpose: to decide how often is a branch taken, and thus, whether it's worth [doing something]. If a branch is so much taken it causes an overflow here, I think the reasonable thing to do is to make it succeed all the comparisons to know if it's taken often enough. If they are all of the form `scale > threshold`, then returning `INT_MAX` will guarantee that we satisfy all such tests. Maybe another way would be to indeed return a special value that would be handled as "result was very high: between INT_MAX and infinity" and handle it as such a value would be, but I suspect it's how `INT_MAX` is handled. Also, I don't think having an overflow here signifies a programming error: we have a product of multiple `int` and a `float`. It's easy to overflow an int: `46341 * 46341` is already more than `INT_MAX`. Indeed, it can change the behavior in some overflow cases, but then I'd argue that the former behavior was basically random, so it's probably not worth trying to preserve. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24824#discussion_r2057666319 From shade at openjdk.org Thu Apr 24 07:08:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Apr 2025 07:08:00 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: <6wY9seD50o7vKaPLgjiGxmPc2nsxDAvUafl9KG9DLZg=.6bc3d41f-9ee0-42d5-a4f3-52304e481224@github.com> On Wed, 23 Apr 2025 15:12:54 GMT, Manuel H?ssig wrote: > ## Issue Summary > > The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. > > ## Change Summary > > To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. > > All changes of this PR summarized: > - add a regression test, > - update the relocation information after patching the method entry for making it not entrant, > - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. > > ## Testing > > I ran tiers 1 through 3 and Oracle internal testing. src/hotspot/cpu/x86/nativeInst_x86.cpp line 389: > 387: void NativeJump::patch_verified_entry(address entry, address verified_entry, address dest) { > 388: // complete jump instruction (to be inserted) is in code_buffer; > 389: union { Do you need this change? Meaning, does it add substantially to this fix? Looks like it does not? I'd omit it here, so the patch is cleanly backportable. We will cleanup the remnants of x86_32 port in due course. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2056316385 From duke at openjdk.org Thu Apr 24 07:08:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 24 Apr 2025 07:08:00 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for Message-ID: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> ## Issue Summary The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. ## Change Summary To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. All changes of this PR summarized: - add a regression test, - update the relocation information after patching the method entry for making it not entrant, - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. ## Testing I ran tiers 1 through 3 and Oracle internal testing. ------------- Commit messages: - Hold NMethodState_lock while printing an nmethod - Update relocation info when making method not entrant - Add regression test Changes: https://git.openjdk.org/jdk/pull/24831/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24831&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8258229 Stats: 93 lines in 2 files changed: 93 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24831.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24831/head:pull/24831 PR: https://git.openjdk.org/jdk/pull/24831 From duke at openjdk.org Thu Apr 24 07:08:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 24 Apr 2025 07:08:00 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> <6wY9seD50o7vKaPLgjiGxmPc2nsxDAvUafl9KG9DLZg=.6bc3d41f-9ee0-42d5-a4f3-52304e481224@github.com> Message-ID: <-uHbqJeg7U97r0I0rPvOeicC9kZorBpX1ppksXJXrvw=.11fb5a3b-a54f-49b3-8b53-c619fd0c5b11@github.com> On Wed, 23 Apr 2025 20:15:30 GMT, Manuel H?ssig wrote: >> src/hotspot/cpu/x86/nativeInst_x86.cpp line 389: >> >>> 387: void NativeJump::patch_verified_entry(address entry, address verified_entry, address dest) { >>> 388: // complete jump instruction (to be inserted) is in code_buffer; >>> 389: union { >> >> Do you need this change? Meaning, does it add substantially to this fix? Looks like it does not? >> >> I'd omit it here, so the patch is cleanly backportable. We will cleanup the remnants of x86_32 port in due course. > > I don't need it. I'll take it out before the RFR. Thanks for pointing it out! Should I file an RFE or do you already have everything tracked? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2056814685 From duke at openjdk.org Thu Apr 24 07:08:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 24 Apr 2025 07:08:00 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <6wY9seD50o7vKaPLgjiGxmPc2nsxDAvUafl9KG9DLZg=.6bc3d41f-9ee0-42d5-a4f3-52304e481224@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> <6wY9seD50o7vKaPLgjiGxmPc2nsxDAvUafl9KG9DLZg=.6bc3d41f-9ee0-42d5-a4f3-52304e481224@github.com> Message-ID: On Wed, 23 Apr 2025 15:28:12 GMT, Aleksey Shipilev wrote: >> ## Issue Summary >> >> The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. >> >> ## Change Summary >> >> To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. >> >> All changes of this PR summarized: >> - add a regression test, >> - update the relocation information after patching the method entry for making it not entrant, >> - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. >> >> ## Testing >> >> I ran tiers 1 through 3 and Oracle internal testing. > > src/hotspot/cpu/x86/nativeInst_x86.cpp line 389: > >> 387: void NativeJump::patch_verified_entry(address entry, address verified_entry, address dest) { >> 388: // complete jump instruction (to be inserted) is in code_buffer; >> 389: union { > > Do you need this change? Meaning, does it add substantially to this fix? Looks like it does not? > > I'd omit it here, so the patch is cleanly backportable. We will cleanup the remnants of x86_32 port in due course. I don't need it. I'll take it out before the RFR. Thanks for pointing it out! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2056812730 From chagedorn at openjdk.org Thu Apr 24 07:18:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Apr 2025 07:18:52 GMT Subject: RFR: 8355387: [jittester] Disable downcasts by default In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:23:11 GMT, Evgeny Nikitin wrote: > Currently, JITTester's love to downcast often produces something like this: > > ArrayList someVar = (TreeSet)(Object)(List)(new ArrayList()); > > ... which is possible because it goes up to Object and then starts downcasting to some totally unrelated class / type. > > Considering the JITTester's love to casts (they are more-or-less 'safe' expressions), it means a high probability (30-50%) of a gentest to fail compilation. Even worse is the situation for ByteCode tests - that they're faulty is only recognized during the run phase. > > I suggest to disable the downcasts for now. > Testing: 50-100 generated tests in different combinations (default, with the flag set to 'false' or 'true') with artificially increased chance to casts. Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24840#pullrequestreview-2790006979 From thartmann at openjdk.org Thu Apr 24 07:18:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 24 Apr 2025 07:18:52 GMT Subject: RFR: 8355387: [jittester] Disable downcasts by default In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:23:11 GMT, Evgeny Nikitin wrote: > Currently, JITTester's love to downcast often produces something like this: > > ArrayList someVar = (TreeSet)(Object)(List)(new ArrayList()); > > ... which is possible because it goes up to Object and then starts downcasting to some totally unrelated class / type. > > Considering the JITTester's love to casts (they are more-or-less 'safe' expressions), it means a high probability (30-50%) of a gentest to fail compilation. Even worse is the situation for ByteCode tests - that they're faulty is only recognized during the run phase. > > I suggest to disable the downcasts for now. > Testing: 50-100 generated tests in different combinations (default, with the flag set to 'false' or 'true') with artificially increased chance to casts. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24840#pullrequestreview-2790003862 From dlong at openjdk.org Thu Apr 24 07:31:59 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Apr 2025 07:31:59 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:48:06 GMT, Marc Chevalier wrote: >> src/hotspot/share/ci/ciMethod.cpp line 919: >> >>> 917: double count_d = (double)count * prof_factor * method_life / counter_life + 0.5; >>> 918: if (count_d >= static_cast(INT_MAX)) { >>> 919: count = INT_MAX; >> >> INT_MAX is probably the best choice, but could cause a change in behavior if the compiler previously returned a negative number here on overflow. It's not clear to me if we want to preserve existing behavior or not. In various places we use different limits for saturated counters and may use a negative number to represent overflow. I don't know if this is by design or accident, but it can happen when, for example, the taken() or not_taken() value of a BranchData overflows and gets clamped to (uint)-1. In Parse::dynamic_branch_prediction() we assign those uint values to int, resulting in a negative number, which will be rejected by counters_are_meaningful(). >> By clamping an overflow here to INT_MAX instead of a UB negative number, we could allow taken == INT_MAX, not_taken == 0 to sneak past, which might be harmless. It's not clear to me the best way to handle saturated values, but you might find it useful looking at the history in JDK-8306331 and JDK-8306481. > > There was still > > count = (count > 0) ? count : 1; > > so actually, values that became negative by overflow are actually pushed up to 1 (which seems to be a legitimate value to me): it was already not possible to distinguish overflows from good results. > > But even if I return a (new) special value for overflows, there is the question of what to do with that from the caller perspective. In my understanding, the result is used for profiling purpose: to decide how often is a branch taken, and thus, whether it's worth [doing something]. If a branch is so much taken it causes an overflow here, I think the reasonable thing to do is to make it succeed all the comparisons to know if it's taken often enough. If they are all of the form `scale > threshold`, then returning `INT_MAX` will guarantee that we satisfy all such tests. Maybe another way would be to indeed return a special value that would be handled as "result was very high: between INT_MAX and infinity" and handle it as such a value would be, but I suspect it's how `INT_MAX` is handled. > > Also, I don't think having an overflow here signifies a programming error: we have a product of multiple `int` and a `float`. It's easy to overflow an int: `46341 * 46341` is already more than `INT_MAX`. > > Indeed, it can change the behavior in some overflow cases, but then I'd argue that the former behavior was basically random, so it's probably not worth trying to preserve. You're right, I missed that a negative value would have been changed to 1 in the old code. Returning INT_MAX seems better than 1 here. It means the inputs did not overflow by themselves (or they would have been negative already), but only overflowed when scaled. So the true value is probably closer to INT_MAX than infinity or "unknown". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24824#discussion_r2057734700 From mchevalier at openjdk.org Thu Apr 24 07:36:44 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 07:36:44 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: <1r2HgWn04pn1fnlp712cwlJlE1L3yiGCGYk_zAMw44U=.53dca0f4-8675-4eab-8990-bc9b232acb68@github.com> On Wed, 23 Apr 2025 15:12:54 GMT, Manuel H?ssig wrote: > ## Issue Summary > > The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. > > ## Change Summary > > To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. > > All changes of this PR summarized: > - add a regression test, > - update the relocation information after patching the method entry for making it not entrant, > - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. > > ## Testing > > I ran tiers 1 through 3 and Oracle internal testing. src/hotspot/share/code/nmethod.cpp line 1650: > 1648: > 1649: void nmethod::print_nmethod(bool printmethod) { > 1650: // Enter a critical section to prevent a race with deopts that patch code and updates the relocation info. I came exactly to check for race conditions, suspecting we need locks or atomicity given the comment on `patch_verified_entry`. Nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2057743555 From duke at openjdk.org Thu Apr 24 08:22:50 2025 From: duke at openjdk.org (duke) Date: Thu, 24 Apr 2025 08:22:50 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v7] In-Reply-To: References: Message-ID: <7kl2UvMwTtuYfKooC5F4HktAjhrm3IbhvYgHA5Ab5hg=.f62497fe-168e-4bf7-baa1-fd25b92dbf23@github.com> On Tue, 22 Apr 2025 13:28:58 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Test with different phase @mhaessig Your change (at version 8f224106a1e8e2f8348349d091eb6bdd49bf402d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24479#issuecomment-2826766957 From duke at openjdk.org Thu Apr 24 08:22:49 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 24 Apr 2025 08:22:49 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 13:28:58 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Test with different phase Thank you for your reviews and improvements. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24479#issuecomment-2826761363 From mchevalier at openjdk.org Thu Apr 24 08:23:47 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 08:23:47 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc Before: duration: 1h 06m 20s; machine time: 22h 21m 08s After: duration: 59m 36s; machine time: 21h 27m 03s It seems not to increase by more than the existing fluctuations. Also, I think it's not quite as simple as this. It's rather a tradeoff between tier1 should be minimal in duration, while being maximal in coverage. A tier1 that doesn't cover some topics is not as useful. Some of these folders should have always been in tier1: they are just a more meaningful classification than stuffing everything in a generically named folder, but the group definition was mistakenly not updated back then. This PR is adding only 39 files. There is one with custom timeout: `predicates/TestCloningWithManyDiamondsInExpression.java` with a timeout of 30. To compare with some tests in `c1` or `c2`, not excluded from tier1, with timeout up to 600. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2826768154 From mchevalier at openjdk.org Thu Apr 24 08:25:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 08:25:56 GMT Subject: RFR: 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' Message-ID: We have a UB when the shift is equal to or bigger than the number of bits in the type. Our expression is (julong)CONST64(1) << (julong)(BitsPerJavaLong - shift) so we have a UB when the RHS is `>= 64`, that is when `shift` is `<= 0`. Since shift is masked to be in `[0, BitPerJavaLong - 1]`, we actually have a UB when `shift == 0`. The code doesn't forbid it, indeed, and it doesn't seem to be enforced by more global invariants. This UB doesn't reproduce anymore with the provided cases. I've replaced the UB with an explicit assert to try to find another failing case. No hit when run with tier1, tier2, tier3, hs-precheckin-comp and hs-comp-stress. Nevertheless, the assert indeed hit on the master of when the issue was filed. More precisely, I've bisect for the two tests `java/foreign/StdLibTest.java` and `java/lang/invoke/PermuteArgsTest.java` and the assert hits until [8339196: Optimize BufWriterImpl#writeU1/U2/Int/Long](https://bugs.openjdk.org/browse/JDK-8339196). It is not clear to me why the issue stopped reproducing after this commit, but given the lack of reproducer, I went with a semi-blind fix: it fixes the issue back then, and still removes a chance of UB. It simply makes sure the RHS of this shift cannot be 64 by making sure `shift` cannot be 0. If `shift` is indeed 0, since it is the RHS of a `RShiftLNode`, `RShiftLNode::Identity` should simply returns the LHS of the shift, and entirely eliminate the RShiftLNode. The implementation of `AndINode::Ideal` is, on the other hand, safe. Indeed, it uses `right_n_bits(BitsPerJavaInteger - shift)` instead of doing manually `(1 << (BitsPerJavaInteger - shift)) - 1`. This macro is safe as it would return `-1` if the shift is too large rather than causing a UB. Yet, I didn't use this way since it would cause the replacement of `AndI(X, RShiftI(Y, 0))` by `AndI(X, URShiftI(Y, 0))` before simplifying the `URShiftI` into `Y`. In between, it also implies that all users of the And node will be enqueued for IGVN for a not-very-interesting change. Simply skipping the replacement of RShiftL into URShiftL allows to directly come to `AndL(X, Y)` without useless steps. Thanks, Marc ------------- Commit messages: - Avoiding the UB Changes: https://git.openjdk.org/jdk/pull/24841/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24841&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338194 Stats: 9 lines in 1 file changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24841.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24841/head:pull/24841 PR: https://git.openjdk.org/jdk/pull/24841 From rcastanedalo at openjdk.org Thu Apr 24 08:30:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 08:30:31 GMT Subject: RFR: 8354520: IGV: dump contextual information [v2] In-Reply-To: References: Message-ID: > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/README.md Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24724/files - new: https://git.openjdk.org/jdk/pull/24724/files/4d1e6f86..3bc6d448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24724/head:pull/24724 PR: https://git.openjdk.org/jdk/pull/24724 From duke at openjdk.org Thu Apr 24 08:31:54 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 24 Apr 2025 08:31:54 GMT Subject: Integrated: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected In-Reply-To: References: Message-ID: <32dZU5LujMC_S8QPlcXQ5Odq9XCVOkAtLJ2ETExUWvM=.1ac3d6d5-1ccf-42c7-a307-a232b44caf96@github.com> On Mon, 7 Apr 2025 07:49:08 GMT, Manuel H?ssig wrote: > # Issue Summary > > When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. > > # Changes > > The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. > Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. > > All changes summarized: > - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, > - add some missing parse predicate nodes to the IR-framework, > - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), > - rework the regex for detecting parse predicates in the IR-framework, > - add a test to ensure parse predicates are cloned into unswitched loops, > - only clone loop limit checks when unswitching uncounted loops, > - add a test which checks that loop limit checks are not cloned when unswitching counted loops, > - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) > - tier1 through tier3 plus Oracle internal testing This pull request has now been integrated. Changeset: 84e9264e Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/84e9264e76ca6e5d984c8eecbf5c5d11128fc174 Stats: 222 lines in 5 files changed: 214 ins; 1 del; 7 mod 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected Co-authored-by: Christian Hagedorn Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24479 From chagedorn at openjdk.org Thu Apr 24 08:31:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Apr 2025 08:31:53 GMT Subject: RFR: 8346552: C2: Add IR tests to check that Predicate cloning in Loop Unswitching works as expected [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 13:28:58 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When a loop is unswitched, all parse predicates from the original loop must be cloned to the second loop that is created. Forgetting to clone a parse predicate is a common error during development on loop unswitching code that we could not catch previously. Since we have the IR-framework now, this PR introduces a test to catch this error. >> >> # Changes >> >> The main contribution of this PR is a test to ensure that all predicates have been cloned into an unswitched loop. >> Working on this PR revealed that Loop Limit Check Parse Predicates are erroneously cloned when unswitching counted loops. That is because we know that the loop variable increments monotonously in counted loops, so a loop limit check at the loop selector is sufficient for both unswitched loops. However, for uncounted loops we do not know anything about the behavior of the loop variables and they could behave differently in either of the unswitched loops. Hence, cloning the loop limit check is needed in that case. This PR also removes the superfluous cloning. >> >> All changes summarized: >> - add `OPAQUE_TEMPLATE_ASSERTION_PREDICATE_NODE` to the IR-framework, >> - add some missing parse predicate nodes to the IR-framework, >> - change the output of the labels of parse predicate nodes in the ideal graph so they can be recognized reliably by the IR-framework (the main problem was that `Loop ` is a prefix of `Loop Limit Check` that is hard to distinguish with spaces instead of underlines), >> - rework the regex for detecting parse predicates in the IR-framework, >> - add a test to ensure parse predicates are cloned into unswitched loops, >> - only clone loop limit checks when unswitching uncounted loops, >> - add a test which checks that loop limit checks are not cloned when unswitching counted loops, >> - add a test which checks that loop limit checks are cloned when unswitching uncounted loops. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14266369099) >> - tier1 through tier3 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Test with different phase Thanks Manuel for the credit! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24479#issuecomment-2826785883 From rcastanedalo at openjdk.org Thu Apr 24 08:33:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 08:33:50 GMT Subject: RFR: 8354520: IGV: dump contextual information [v2] In-Reply-To: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> References: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> Message-ID: On Wed, 23 Apr 2025 10:51:27 GMT, Manuel H?ssig wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/README.md >> >> Co-authored-by: Manuel H?ssig > > src/utils/IdealGraphVisualizer/README.md line 57: > >> 55: The JVM provides some entry functions to dump graphs from a debugger such as >> 56: `gdb` or `rr`, see the different variants of `igv_print` and `igv_append` in >> 57: `compile.cpp`. In combination with the IGV network interface, these functions > > Suggestion: > > [`compile.cpp`](src/hotspot/share/opto/compile.cpp). In combination with the > IGV network interface, these functions Thanks for the suggestion, but what is the value of adding such a link here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2057848401 From epeter at openjdk.org Thu Apr 24 08:35:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Apr 2025 08:35:54 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: <5JpWWMlRP-o60KZI9bU5bMq-dJePHvnKdUgigCfwbfo=.c5545951-3f36-43da-b082-79a3a00ac6c0@github.com> Message-ID: On Wed, 23 Apr 2025 17:01:33 GMT, Dhamoder Nalla wrote: >>> @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. >> >> You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. >> Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved ? each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. >> However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA. > >> > @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. >> >> You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved ? each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA. > > @eme64 If this answers your question, this PR is ready for review @dhanalla I see. @chhagedorn and I quickly looked through the code, and it seems there are other bailouts that use the FudgeFactor. It also seems that you need an unreasonably high `EliminateAllocationArraySizeLimit`, and so this failure should never actually happen normally, right? Or is it possible to reproduce the same bug with a lower `EliminateAllocationArraySizeLimit` but just more allocations? If so, it would be good if you added such test cases. Is it possible to exceed the node limit with the default `EliminateAllocationArraySizeLimit`, i.e. so that we would hit the assert before your changes, and bailout after your changes? I have two worries, and maybe @vnkozlov can say something here: - By the time we check the condition and bail out, we may have allocated a lot of nodes, and possibly be far over the node limit. That means we already used a lot of memory and time. How bad can this get? - And as discussed above: we could have done EA partially, until getting close to the node limit, and then not do allocation elimination on the remaining allocations. That would be a partial benefit, which we do not have if we recompile without EA. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2826798704 From duke at openjdk.org Thu Apr 24 08:40:47 2025 From: duke at openjdk.org (duke) Date: Thu, 24 Apr 2025 08:40:47 GMT Subject: RFR: 8355387: [jittester] Disable downcasts by default In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:23:11 GMT, Evgeny Nikitin wrote: > Currently, JITTester's love to downcast often produces something like this: > > ArrayList someVar = (TreeSet)(Object)(List)(new ArrayList()); > > ... which is possible because it goes up to Object and then starts downcasting to some totally unrelated class / type. > > Considering the JITTester's love to casts (they are more-or-less 'safe' expressions), it means a high probability (30-50%) of a gentest to fail compilation. Even worse is the situation for ByteCode tests - that they're faulty is only recognized during the run phase. > > I suggest to disable the downcasts for now. > Testing: 50-100 generated tests in different combinations (default, with the flag set to 'false' or 'true') with artificially increased chance to casts. @lepestock Your change (at version 5f1e46c1d4e47ef9b2a53c503a5b74674792b87d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24840#issuecomment-2826811445 From shade at openjdk.org Thu Apr 24 08:50:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Apr 2025 08:50:00 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <-uHbqJeg7U97r0I0rPvOeicC9kZorBpX1ppksXJXrvw=.11fb5a3b-a54f-49b3-8b53-c619fd0c5b11@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> <6wY9seD50o7vKaPLgjiGxmPc2nsxDAvUafl9KG9DLZg=.6bc3d41f-9ee0-42d5-a4f3-52304e481224@github.com> <-uHbqJeg7U97r0I0rPvOeicC9kZorBpX1ppksXJXrvw=.11fb5a3b-a54f-49b3-8b53-c619fd0c5b11@github.com> Message-ID: On Wed, 23 Apr 2025 20:17:18 GMT, Manuel H?ssig wrote: >> I don't need it. I'll take it out before the RFR. Thanks for pointing it out! > > Should I file an RFE or do you already have everything tracked? There is an umbrella RFE, I created the task for `nativeInst` here: https://bugs.openjdk.org/browse/JDK-8355472 -- feel free to take it and put this hunk and other cleanups in `nativeInst_x86*` there :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2057882431 From roland at openjdk.org Thu Apr 24 08:50:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 08:50:46 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop [v7] In-Reply-To: References: Message-ID: <9et4RPbKkI0_VlqkWSULJg6pXrEJh0fBhFJ3uaNdNmQ=.58a180bf-01ce-43d1-813b-6f6956ba1403@github.com> > The test crashes because of a division by zero. The `Div` node for > that one is initially part of a counted loop. The control input of the > node is cleared because the divisor is non zero. This is because the > divisor depends on the loop phi and the type of the loop phi is > narrowed down when the counted loop is created. pre/main/post loops > are created, unrolling happens, the main loop looses its backedge. The > `Div` node can then float above the zero trip guard for the main > loop. When the zero trip guard is not taken, there's no guarantee the > divisor is non zero so the `Div` node should be pinned below it. > > I propose we revert the change I made with 8334724 which removed > `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this > method inserted was there to handle exactly this problem. It was added > initially for a similar issue but with array loads. That problem with > loads is handled some other way now and that's why I thought it was > safe to proceed with the removal. > > The code in this patch is somewhat different from the one we had > before for a couple reasons: > > 1- assert predicate code evolved and so previous logic can't be > resurrected as it was. > > 2- the previous logic has a bug. > > Regarding 1-: during pre/main/post loop creation, we used to add the > `CastII` and then to add assertion predicates (so assertion predicates > depended on the `CastII`). Then when unrolling, when assertion > predicates are updated, we would skip over the `CastII`. What I > propose here is to add the `CastII` after assertion predicates are > added. As a result, they don't depend on the `CastII` and there's no > need for any extra logic when unrolling happens. This, however, > doesn't work when the assertion predicates are added by RCE. In that > case, I had to add logic to skip over the `CastII` (similar to what > existed before I removed it). > > Regarding 2-: previous implementation for > `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at > the first loop `Phi` it encounters that's a use of the loop increment: > it's usually the iv but not always. I tweaked the test case to show, > this bug can actually cause a crash and changed the logic for > `PhaseIdealLoop::cast_incr_before_loop()` accordingly. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - review - Merge branch 'master' into JDK-8349139 - whitespace - review - Merge branch 'master' into JDK-8349139 - merge - Merge branch 'master' into JDK-8349139 - other test + review comment - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - ... and 2 more: https://git.openjdk.org/jdk/compare/84e9264e...4ff531af ------------- Changes: https://git.openjdk.org/jdk/pull/23617/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23617&range=06 Stats: 206 lines in 7 files changed: 174 ins; 25 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23617/head:pull/23617 PR: https://git.openjdk.org/jdk/pull/23617 From roland at openjdk.org Thu Apr 24 08:50:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 08:50:47 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop [v6] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 08:28:18 GMT, Christian Hagedorn wrote: > Looks good to me, too. > > Nit: You should probably update the title: "divisor not null" -> "divisor not zero". Thanks for reviewing this. I made the changes you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2826836537 From duke at openjdk.org Thu Apr 24 09:05:46 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 24 Apr 2025 09:05:46 GMT Subject: RFR: 8354520: IGV: dump contextual information [v2] In-Reply-To: References: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> Message-ID: On Thu, 24 Apr 2025 08:30:46 GMT, Roberto Casta?eda Lozano wrote: >> src/utils/IdealGraphVisualizer/README.md line 57: >> >>> 55: The JVM provides some entry functions to dump graphs from a debugger such as >>> 56: `gdb` or `rr`, see the different variants of `igv_print` and `igv_append` in >>> 57: `compile.cpp`. In combination with the IGV network interface, these functions >> >> Suggestion: >> >> [`compile.cpp`](src/hotspot/share/opto/compile.cpp). In combination with the >> IGV network interface, these functions > > Thanks for the suggestion, but what is the value of adding such a link here? I often find them helpful to find the files a readme is referring to, so I tend to add them. But it's up to you whether you like them as well :slightly_smiling_face: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2057913177 From chagedorn at openjdk.org Thu Apr 24 09:08:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Apr 2025 09:08:57 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop [v7] In-Reply-To: <9et4RPbKkI0_VlqkWSULJg6pXrEJh0fBhFJ3uaNdNmQ=.58a180bf-01ce-43d1-813b-6f6956ba1403@github.com> References: <9et4RPbKkI0_VlqkWSULJg6pXrEJh0fBhFJ3uaNdNmQ=.58a180bf-01ce-43d1-813b-6f6956ba1403@github.com> Message-ID: On Thu, 24 Apr 2025 08:50:46 GMT, Roland Westrelin wrote: >> The test crashes because of a division by zero. The `Div` node for >> that one is initially part of a counted loop. The control input of the >> node is cleared because the divisor is non zero. This is because the >> divisor depends on the loop phi and the type of the loop phi is >> narrowed down when the counted loop is created. pre/main/post loops >> are created, unrolling happens, the main loop looses its backedge. The >> `Div` node can then float above the zero trip guard for the main >> loop. When the zero trip guard is not taken, there's no guarantee the >> divisor is non zero so the `Div` node should be pinned below it. >> >> I propose we revert the change I made with 8334724 which removed >> `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this >> method inserted was there to handle exactly this problem. It was added >> initially for a similar issue but with array loads. That problem with >> loads is handled some other way now and that's why I thought it was >> safe to proceed with the removal. >> >> The code in this patch is somewhat different from the one we had >> before for a couple reasons: >> >> 1- assert predicate code evolved and so previous logic can't be >> resurrected as it was. >> >> 2- the previous logic has a bug. >> >> Regarding 1-: during pre/main/post loop creation, we used to add the >> `CastII` and then to add assertion predicates (so assertion predicates >> depended on the `CastII`). Then when unrolling, when assertion >> predicates are updated, we would skip over the `CastII`. What I >> propose here is to add the `CastII` after assertion predicates are >> added. As a result, they don't depend on the `CastII` and there's no >> need for any extra logic when unrolling happens. This, however, >> doesn't work when the assertion predicates are added by RCE. In that >> case, I had to add logic to skip over the `CastII` (similar to what >> existed before I removed it). >> >> Regarding 2-: previous implementation for >> `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at >> the first loop `Phi` it encounters that's a use of the loop increment: >> it's usually the iv but not always. I tweaked the test case to show, >> this bug can actually cause a crash and changed the logic for >> `PhaseIdealLoop::cast_incr_before_loop()` accordingly. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - review > - Merge branch 'master' into JDK-8349139 > - whitespace > - review > - Merge branch 'master' into JDK-8349139 > - merge > - Merge branch 'master' into JDK-8349139 > - other test + review comment > - Merge branch 'master' into JDK-8349139 > - Merge branch 'master' into JDK-8349139 > - ... and 2 more: https://git.openjdk.org/jdk/compare/84e9264e...4ff531af Thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23617#pullrequestreview-2790345344 From roland at openjdk.org Thu Apr 24 09:13:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:13:19 GMT Subject: Integrated: 8349139: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 16:30:20 GMT, Roland Westrelin wrote: > The test crashes because of a division by zero. The `Div` node for > that one is initially part of a counted loop. The control input of the > node is cleared because the divisor is non zero. This is because the > divisor depends on the loop phi and the type of the loop phi is > narrowed down when the counted loop is created. pre/main/post loops > are created, unrolling happens, the main loop looses its backedge. The > `Div` node can then float above the zero trip guard for the main > loop. When the zero trip guard is not taken, there's no guarantee the > divisor is non zero so the `Div` node should be pinned below it. > > I propose we revert the change I made with 8334724 which removed > `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this > method inserted was there to handle exactly this problem. It was added > initially for a similar issue but with array loads. That problem with > loads is handled some other way now and that's why I thought it was > safe to proceed with the removal. > > The code in this patch is somewhat different from the one we had > before for a couple reasons: > > 1- assert predicate code evolved and so previous logic can't be > resurrected as it was. > > 2- the previous logic has a bug. > > Regarding 1-: during pre/main/post loop creation, we used to add the > `CastII` and then to add assertion predicates (so assertion predicates > depended on the `CastII`). Then when unrolling, when assertion > predicates are updated, we would skip over the `CastII`. What I > propose here is to add the `CastII` after assertion predicates are > added. As a result, they don't depend on the `CastII` and there's no > need for any extra logic when unrolling happens. This, however, > doesn't work when the assertion predicates are added by RCE. In that > case, I had to add logic to skip over the `CastII` (similar to what > existed before I removed it). > > Regarding 2-: previous implementation for > `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at > the first loop `Phi` it encounters that's a use of the loop increment: > it's usually the iv but not always. I tweaked the test case to show, > this bug can actually cause a crash and changed the logic for > `PhaseIdealLoop::cast_incr_before_loop()` accordingly. This pull request has now been integrated. Changeset: be6e4406 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/be6e4406d8c9024bb368ed9dc22d4a6df2a0846a Stats: 206 lines in 7 files changed: 174 ins; 25 del; 7 mod 8349139: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop Reviewed-by: chagedorn, epeter, qamai ------------- PR: https://git.openjdk.org/jdk/pull/23617 From roland at openjdk.org Thu Apr 24 09:13:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:13:17 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop [v7] In-Reply-To: References: <9et4RPbKkI0_VlqkWSULJg6pXrEJh0fBhFJ3uaNdNmQ=.58a180bf-01ce-43d1-813b-6f6956ba1403@github.com> Message-ID: On Thu, 24 Apr 2025 09:06:04 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - review >> - Merge branch 'master' into JDK-8349139 >> - whitespace >> - review >> - Merge branch 'master' into JDK-8349139 >> - merge >> - Merge branch 'master' into JDK-8349139 >> - other test + review comment >> - Merge branch 'master' into JDK-8349139 >> - Merge branch 'master' into JDK-8349139 >> - ... and 2 more: https://git.openjdk.org/jdk/compare/84e9264e...4ff531af > > Thanks for the update! @chhagedorn thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2826913539 From rcastanedalo at openjdk.org Thu Apr 24 09:13:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 09:13:54 GMT Subject: RFR: 8354520: IGV: dump contextual information [v2] In-Reply-To: References: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> Message-ID: On Thu, 24 Apr 2025 09:03:21 GMT, Manuel H?ssig wrote: >> Thanks for the suggestion, but what is the value of adding such a link here? > > I often find them helpful to find the files a readme is referring to, so I tend to add them. But it's up to you whether you like them as well :slightly_smiling_face: Fair enough, so the relative link will resolve correctly and point to the actual `compile.cpp` file on GitHub? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2057925838 From roland at openjdk.org Thu Apr 24 09:15:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:15:00 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> References: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> Message-ID: On Mon, 14 Apr 2025 11:50:27 GMT, Quan Anh Mai wrote: > If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? The current patch constant folds the `CastII` in that case. I could write a test case where that's an issue (it causes an out of bound load to float above the range check it depends on). I'm working on an update to the patch to address this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-2826920388 From roland at openjdk.org Thu Apr 24 09:17:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:17:50 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. @emea thanks for the comments. As mentioned in another comment, I'm in the process of reworking the patch. > I'm wondering if we should pick either `depends_only_on_test` or `pinned`, and use it everywhere consistently. Having both around as near synonymes (antonymes?) is a bit confusing for me. `depends_only_on_test` comes from `Node::depends_only_on_test`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-2826927034 From rcastanedalo at openjdk.org Thu Apr 24 09:18:58 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 09:18:58 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v3] In-Reply-To: References: Message-ID: <8k5zQHd7jsyvpf5SREhqvnloZdjZMuVjaOBRxy6gchw=.ff0924f9-2d93-4c1e-a135-f3886b71660a@github.com> On Wed, 23 Apr 2025 09:42:01 GMT, Saranya Natarajan wrote: >> Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. >> >> Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24427#pullrequestreview-2790374766 From roland at openjdk.org Thu Apr 24 09:19:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:19:28 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v3] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <-QlVA5lDYUfV7cfeXDjB5MMdqTCmZnGlB3lNHeLwN2w=.59146193-5cde-42af-a19a-148b1b2308bd@github.com> > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/a4031f3c..c32e453e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Thu Apr 24 09:27:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:27:45 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v4] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: - Update test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java Co-authored-by: Emanuel Peter - Update test/hotspot/jtreg/compiler/macronodes/TestInitializingStoreCapturing.java Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/multnode.hpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/escape.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/multnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/c32e453e..377a8d7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=02-03 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From thartmann at openjdk.org Thu Apr 24 09:32:48 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 24 Apr 2025 09:32:48 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v3] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 11:20:48 GMT, Marc Chevalier wrote: >> The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: >> >> - Test, TestSimple: >> Disappeared with: >> [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 >> which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. >> Reverting this fix makes the issue reappear. >> - Reduced2: I fix here >> - Test3, Reduced3: >> Disappeared with: >> [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 >> which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) >> >> The issue comes from the fact that `And[IL]Node::Value` has a special handling when >> an operand is a left-shift: in the expression >> >> lhs & (X << s) >> >> if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling >> also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, >> updating the Shift node during IGVN won't enqueue directly the And node, but only the >> Conv node. If this conv node cannot be improved, the And node is not enqueued, and its >> type is not as good as it could be. >> >> Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a >> dead branch, so the node is about to be eleminated. On the second figure, we can see >> `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens >> during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. >> >> >> >> >> >> The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN >> to make sure it has a chance to be refined. >> >> The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes >> also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and >> the `LShiftINode`. The fix has to take that into account. >> >> >> >> >> Overall, the situation can be of the form: >> >> LShift -> C... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Add the other id to @bug Thanks for the thorough investigation. Looks good to me! > Nevertheless, this solution feels way out-of-scope... Right, this goes in the direction of https://github.com/openjdk/jdk/pull/17508. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24792#pullrequestreview-2790418360 From shade at openjdk.org Thu Apr 24 09:32:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Apr 2025 09:32:51 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 00:20:03 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 26: > >> 24: >> 25: #ifndef SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP >> 26: #define SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP > > Stale header file name used? Right. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057962282 From roland at openjdk.org Thu Apr 24 09:33:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Apr 2025 09:33:54 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v5] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/escape.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/escape.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/377a8d7b..7afc47e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From duke at openjdk.org Thu Apr 24 09:36:04 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 24 Apr 2025 09:36:04 GMT Subject: Integrated: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 21:06:17 GMT, Saranya Natarajan wrote: > Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. > > Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() This pull request has now been integrated. Changeset: 74a2c831 Author: Saranya Natarajan Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/74a2c831a2af55c66317ca8aead53fde2a2a6900 Stats: 41 lines in 5 files changed: 4 ins; 0 del; 37 mod 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() Reviewed-by: rcastanedalo, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24427 From duke at openjdk.org Thu Apr 24 09:51:42 2025 From: duke at openjdk.org (erifan) Date: Thu, 24 Apr 2025 09:51:42 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 01:36:10 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004... @jatin-bhateja Thanks for your review! ------------- PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2790317981 From duke at openjdk.org Thu Apr 24 09:51:45 2025 From: duke at openjdk.org (erifan) Date: Thu, 24 Apr 2025 09:51:45 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 12:09:51 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 ... > > src/hotspot/share/opto/vectornode.cpp line 2234: > >> 2232: // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast to in1. >> 2233: if (in1->Opcode() != Op_VectorMaskCmp && in1->Opcode() != Op_VectorMaskCast) { >> 2234: swap(in1, in2); > > Swapping inputs like this without refreshing GVN bookkeeping is not safe. I guess you wanted to use Node::swap_edges. The edges are not swapped, but two variables in1 and in2 > src/hotspot/share/opto/vectornode.cpp line 2243: > >> 2241: in1 = in1->in(1); >> 2242: } >> 2243: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || > > Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks for a specific graph palette and replaces it with a new node whose inputs are the same as the inputs of the palette. GVN will do the retention job if any intermediate node has users beyond the pattern being replaced. Thanks for telling me this information. Another more important reason to check outcnt here is to prevent this optimization when the uses of VectorMaskCmp is greater than 1, because this optimization may not be worthwhile. For example: public static void testVectorMaskCmp() { IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0); IntVector av = IntVector.fromArray(I_SPECIES, ia, 0); VectorMask m1 = av.compare(VectorOperators.NE, bv); // two uses VectorMask m2 =m1.not(); m1.intoArray(m, 0); av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0); } If we do not check outcnt and still do this optimization, two VectorMaskCmp nodes will be generated, and finally two VectorMaskCmp instructions will be generated. This is unreasonable because VectorMaskCmp has much higher latency than xor instruction on aarch64. > src/hotspot/share/opto/vectornode.cpp line 2265: > >> 2263: vmcmp = new VectorMaskCastNode(phase->transform(vmcmp), vmcast_vt); >> 2264: } >> 2265: return vmcmp; > > It would be preferable if you could kindly re-factor the code such that we only call VectorNode::Ideal once at return to comply with aesthetics of other idealization routines. Ok, I'll change it in the next commit. > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38: > >> 36: * @summary test combining vector not operation with compare >> 37: * @modules jdk.incubator.vector >> 38: * @requires ((os.arch!="x86" & os.arch!="i386" & os.arch!="amd64" & os.arch!="x86_64") | vm.cpu.features ~= ".*avx.*") > > You can remove this platform limitation and forward the constraints to @IR rules using applyIfCPUFeatureOr Since this is a platform independent optimization, I tend to use this `@requires` because it's simpler. If we use `applyIfCPUFeatureOr`, we need to add the same restriction before each test. In addition, if a new architecture supports the vector node, this test may not cover it. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2057901771 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2057976512 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2057982231 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2057993569 From mli at openjdk.org Thu Apr 24 09:58:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 09:58:51 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: <5PvVtYOh3PZVyX7cN9GR3jeVJ-9LKZ7fHefIeKQVjlc=.926a830b-f312-4a67-9684-84546afad26b@github.com> On Thu, 24 Apr 2025 02:49:55 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> copyright > > test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Double.java line 94: > >> 92: @IR(applyIfPlatform = {"riscv64", "true"}, >> 93: applyIf = {"SuperWordReductions", "true"}, >> 94: failOn = {IRNode.MUL_REDUCTION_VF}) > > I think we can just leave this two test files (`ProdRed_Float.java`/`ProdRed_Double.java`) untouched? > The IR check is only enabled for x86 sse2 target for now. It does not apply to other CPUs. So I don't see why RISC-V is special here. I added these `failOn` checks for several reasons: 1. It's more clear for user reading the code in the future why we have Mul Reduction for int/long but not for float/double. 2. catch any *wrong* attemp to implement these instructs in the future, unless we have an efficient way to implement the ordered computation. For this one, we could catch it manually by reviewing code, but it's always better to do it automatically. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2058006403 From rcastanedalo at openjdk.org Thu Apr 24 10:04:12 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 10:04:12 GMT Subject: RFR: 8354520: IGV: dump contextual information [v3] In-Reply-To: References: Message-ID: > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Replace 'C2 call stack' by more precise 'C2 stack trace' in IGV documentation - Extend dump_bfs help message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24724/files - new: https://git.openjdk.org/jdk/pull/24724/files/3bc6d448..86fdc673 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=01-02 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24724/head:pull/24724 PR: https://git.openjdk.org/jdk/pull/24724 From rcastanedalo at openjdk.org Thu Apr 24 10:07:45 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 10:07:45 GMT Subject: RFR: 8354520: IGV: dump contextual information [v3] In-Reply-To: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> References: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> Message-ID: On Wed, 23 Apr 2025 11:41:07 GMT, Manuel H?ssig wrote: > I found two typos, but otherwise it looks good. I also tested printing to IGV from `rr` and the new features work as advertised. Thanks for trying out and reviewing, Manuel! > Perhaps you could update the `dump_bfs` help with a comment about the new arguments? Done (commit d65ecdb5), let me know if that works for you. I also made a minor correction to the IGV documentation (commit 86fdc673). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24724#issuecomment-2827057402 From fyang at openjdk.org Thu Apr 24 10:24:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Apr 2025 10:24:04 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: <5PvVtYOh3PZVyX7cN9GR3jeVJ-9LKZ7fHefIeKQVjlc=.926a830b-f312-4a67-9684-84546afad26b@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> <5PvVtYOh3PZVyX7cN9GR3jeVJ-9LKZ7fHefIeKQVjlc=.926a830b-f312-4a67-9684-84546afad26b@github.com> Message-ID: <1GAnt3Sk1bXadYA_3Q0dVLV8e8ddTZULFgSVWZYRZko=.1db15ac4-ae6e-409d-96c2-f11c44f3297d@github.com> On Thu, 24 Apr 2025 09:53:10 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Double.java line 94: >> >>> 92: @IR(applyIfPlatform = {"riscv64", "true"}, >>> 93: applyIf = {"SuperWordReductions", "true"}, >>> 94: failOn = {IRNode.MUL_REDUCTION_VF}) >> >> I think we can just leave this two test files (`ProdRed_Float.java`/`ProdRed_Double.java`) untouched? >> The IR check is only enabled for x86 sse2 target for now. It does not apply to other CPUs. So I don't see why RISC-V is special here. > > I added these `failOn` checks for several reasons: > 1. It's more clear for user reading the code in the future why we have Mul Reduction for int/long but not for float/double. > 2. catch any *wrong* attemp to implement these instructs in the future, unless we have an efficient way to implement the ordered computation. For this one, we could catch it manually by reviewing code, but it's always better to do it automatically. Ah, I see. There is a `ProdRed_Int.java` test there in the same directory. Then should we check for `MUL_REDUCTION_VD` instead of `MUL_REDUCTION_VF`? We are operating on doubles here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2058049295 From mchevalier at openjdk.org Thu Apr 24 10:27:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 10:27:52 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v3] In-Reply-To: References: Message-ID: <3KqajIStp2pMz0daPVhmFIgn9BCTU2NtXfVjRLUghRM=.98e609e1-ec6d-4a3b-b946-51c1276a2eb3@github.com> On Wed, 23 Apr 2025 11:20:48 GMT, Marc Chevalier wrote: >> The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: >> >> - Test, TestSimple: >> Disappeared with: >> [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 >> which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. >> Reverting this fix makes the issue reappear. >> - Reduced2: I fix here >> - Test3, Reduced3: >> Disappeared with: >> [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 >> which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) >> >> The issue comes from the fact that `And[IL]Node::Value` has a special handling when >> an operand is a left-shift: in the expression >> >> lhs & (X << s) >> >> if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling >> also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, >> updating the Shift node during IGVN won't enqueue directly the And node, but only the >> Conv node. If this conv node cannot be improved, the And node is not enqueued, and its >> type is not as good as it could be. >> >> Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a >> dead branch, so the node is about to be eleminated. On the second figure, we can see >> `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens >> during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. >> >> >> >> >> >> The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN >> to make sure it has a chance to be refined. >> >> The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes >> also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and >> the `LShiftINode`. The fix has to take that into account. >> >> >> >> >> Overall, the situation can be of the form: >> >> LShift -> C... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Add the other id to @bug Thanks @eme64 and @TobiHartmann for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2827109512 From duke at openjdk.org Thu Apr 24 10:27:52 2025 From: duke at openjdk.org (duke) Date: Thu, 24 Apr 2025 10:27:52 GMT Subject: RFR: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand [v3] In-Reply-To: References: Message-ID: <8Toytc93P7oYQL4LB3oqCpteRwYSyvQckawKJk7w0zg=.8faef25e-c573-4e4f-9c40-805bc3359a7b@github.com> On Wed, 23 Apr 2025 11:20:48 GMT, Marc Chevalier wrote: >> The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: >> >> - Test, TestSimple: >> Disappeared with: >> [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 >> which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. >> Reverting this fix makes the issue reappear. >> - Reduced2: I fix here >> - Test3, Reduced3: >> Disappeared with: >> [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 >> which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) >> >> The issue comes from the fact that `And[IL]Node::Value` has a special handling when >> an operand is a left-shift: in the expression >> >> lhs & (X << s) >> >> if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling >> also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, >> updating the Shift node during IGVN won't enqueue directly the And node, but only the >> Conv node. If this conv node cannot be improved, the And node is not enqueued, and its >> type is not as good as it could be. >> >> Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a >> dead branch, so the node is about to be eleminated. On the second figure, we can see >> `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens >> during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. >> >> >> >> >> >> The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN >> to make sure it has a chance to be refined. >> >> The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes >> also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and >> the `LShiftINode`. The fix has to take that into account. >> >> >> >> >> Overall, the situation can be of the form: >> >> LShift -> C... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Add the other id to @bug @marc-chevalier Your change (at version d99bfd1178361fa540842753bc92559ff5d318c3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24792#issuecomment-2827111709 From aph at openjdk.org Thu Apr 24 10:30:50 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Apr 2025 10:30:50 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: <46g4wcnZe1Hiodlu9pe4VOoE6hzpKz5tousDFKzs8qA=.edca6b56-d299-41de-a714-4f5ad5bdaa6d@github.com> On Fri, 18 Apr 2025 01:36:10 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004... src/hotspot/share/opto/node.cpp line 1226: > 1224: // be added to the IGVN worklist, then the optimization will not be applied. > 1225: // Therefore, add this node into IGVN worklist to make the optimization happen. > 1226: return true; Suggestion: } else if (n->Opcode() == Op_XorV || n->Opcode() == Op_XorVMask) { // Condition for removing an unnecessary not() following // a compare(...) operation. // The predecessor of n (this XorV or XorVMask) may also be used // by a useless VectorBox node which will later be eliminated by // RemoveUseless. Return true to ensure that subgraph // transformations are performed on n. return true; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2058061667 From mchevalier at openjdk.org Thu Apr 24 10:30:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 10:30:51 GMT Subject: Integrated: 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 08:06:48 GMT, Marc Chevalier wrote: > The JBS issues has 3 reproducers. Two of them don't reproduce anymore. Let's enumerate: > > - Test, TestSimple: > Disappeared with: > [JDK-8319372: C2 compilation fails with "Bad immediate dominator info"](https://bugs.openjdk.org/browse/JDK-8319372) in #16844 > which actually fixed it by removing the handling of the problematic pattern in `CastIINode::Value`. > Reverting this fix makes the issue reappear. > - Reduced2: I fix here > - Test3, Reduced3: > Disappeared with: > [JDK-8347459: C2: missing transformation for chain of shifts/multiplications by constants](https://bugs.openjdk.org/browse/JDK-8347459) in #23728 > which just shadowed it. The bug is essentially the same as Reduced2 otherwise. I also fix these here (even with reverted JDK-8347459) > > The issue comes from the fact that `And[IL]Node::Value` has a special handling when > an operand is a left-shift: in the expression > > lhs & (X << s) > > if `lhs` fits in less than `s` bits, the result is sure to be 0. This special handling > also tolerate a `ConvI2LNode` between the `AndLNode` and `LShiftINode`. In this case, > updating the Shift node during IGVN won't enqueue directly the And node, but only the > Conv node. If this conv node cannot be improved, the And node is not enqueued, and its > type is not as good as it could be. > > Such a case is illustrated on the following figures from Reduced2. Node `239 Phi` is a phi with a > dead branch, so the node is about to be eleminated. On the second figure, we can see > `152 LShiftI` taking its place. The node `243 ConvI2L` is enqueued, but no change happens > during its idealization, so the node `244 AndL` is not enqueued, while it could receive an update. > > > > > > The fix is pretty direct: we recognize this pattern and we enqueue the And node during IGVN > to make sure it has a chance to be refined. > > The case of Reduced3 is mostly the same, with a twist: the special handling of And nodes > also can see through casts. See the next figure with a `CastIINode` between the `AndINode` and > the `LShiftINode`. The fix has to take that into account. > > > > > Overall, the situation can be of the form: > > LShift -> Cast+ -> ConvI2L -> Cast+ -> And > > This second case was shadowed by [JDK-8347459](https://bugs... This pull request has now been integrated. Changeset: 6254046f Author: Marc Chevalier Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6254046f508049a4e568f0f2eae51dc10da392c1 Stats: 244 lines in 4 files changed: 244 ins; 0 del; 0 mod 8320909: C2: Adapt IGVN's enqueuing logic to match idealization of AndNode with LShift operand Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24792 From mli at openjdk.org Thu Apr 24 10:31:40 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 10:31:40 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: <1GAnt3Sk1bXadYA_3Q0dVLV8e8ddTZULFgSVWZYRZko=.1db15ac4-ae6e-409d-96c2-f11c44f3297d@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> <5PvVtYOh3PZVyX7cN9GR3jeVJ-9LKZ7fHefIeKQVjlc=.926a830b-f312-4a67-9684-84546afad26b@github.com> <1GAnt3Sk1bXadYA_3Q0dVLV8e8ddTZULFgSVWZYRZko=.1db15ac4-ae6e-409d-96c2-f11c44f3297d@github.com> Message-ID: <5HXiamN-I6OSifZO1-AHOArPIacgB3mYTjoflaM9hD4=.adc0f714-7315-4a9a-a2a2-44f52d94bf9f@github.com> On Thu, 24 Apr 2025 10:20:02 GMT, Fei Yang wrote: >> I added these `failOn` checks for several reasons: >> 1. It's more clear for user reading the code in the future why we have Mul Reduction for int/long but not for float/double. >> 2. catch any *wrong* attemp to implement these instructs in the future, unless we have an efficient way to implement the ordered computation. For this one, we could catch it manually by reviewing code, but it's always better to do it automatically. > > Ah, I see. There is a `ProdRed_Int.java` test there in the same directory. Then should we check for `MUL_REDUCTION_VD` instead of `MUL_REDUCTION_VF`? We are operating on doubles here. Thanks for catching! To confirm, do you mean the line 111 in this file? I'll fix it. Or you suggest some fix in ProdRed_Int.java too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2058062206 From fyang at openjdk.org Thu Apr 24 10:41:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Apr 2025 10:41:53 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: <5HXiamN-I6OSifZO1-AHOArPIacgB3mYTjoflaM9hD4=.adc0f714-7315-4a9a-a2a2-44f52d94bf9f@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> <5PvVtYOh3PZVyX7cN9GR3jeVJ-9LKZ7fHefIeKQVjlc=.926a830b-f312-4a67-9684-84546afad26b@github.com> <1GAnt3Sk1bXadYA_3Q0dVLV8e8ddTZULFgSVWZYRZko=.1db15ac4-ae6e-409d-96c2-f11c44f3297d@github.com> <5HXiamN-I6OSifZO1-AHOArPIacgB3mYTjoflaM9hD4=.adc0f714-7315-4a9a-a2a2-44f52d94bf9f@github.com> Message-ID: On Thu, 24 Apr 2025 10:28:37 GMT, Hamlin Li wrote: >> Ah, I see. There is a `ProdRed_Int.java` test there in the same directory. Then should we check for `MUL_REDUCTION_VD` instead of `MUL_REDUCTION_VF`? We are operating on doubles here. > > Thanks for catching! > ~~To confirm, do you mean the line 111 in this file? I'll fix it.~~ > ~~Or you suggest some fix in ProdRed_Int.java too?~~ > I see what you meant, I'll fix it. I see two occurrences in file ProdRed_Double.java: Both L94 and L111. Regarding ProdRed_Int.java, seems that we can do a similar cleanup like you do in this PR for other tests. I mean: diff --git a/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java b/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java index ebc8251e025..9ca670117cf 100644 --- a/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java +++ b/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java @@ -85,8 +85,7 @@ public static void prodReductionInit(int[] a, int[] b) { @IR(applyIfCPUFeature = {"sse4.1", "true"}, applyIfAnd = {"SuperWordReductions", "true", "LoopMaxUnroll", ">= 8"}, counts = {IRNode.MUL_REDUCTION_VI, ">= 1", IRNode.MUL_REDUCTION_VI, "<= 2"}) // one for main-loop, one for vector-post-loop - @IR(applyIfPlatform = {"riscv64", "true"}, - applyIfCPUFeature = {"rvv", "true"}, + @IR(applyIfCPUFeature = {"rvv", "true"}, applyIfAnd = {"SuperWordReductions", "true", "LoopMaxUnroll", ">= 8"}, counts = {IRNode.MUL_REDUCTION_VI, ">= 1", IRNode.MUL_REDUCTION_VI, "<= 2"}) // one for main-loop, one for vector-post-loop public static int prodReductionImplement(int[] a, int[] b, int total) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2058079363 From rcastanedalo at openjdk.org Thu Apr 24 10:55:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 10:55:37 GMT Subject: RFR: 8354520: IGV: dump contextual information [v4] In-Reply-To: References: Message-ID: <6vhZrRta9aAor4HaEOi2vpDXbJuZcEoJuP5sbjvekyA=.fba4a11e-d889-4537-8ce9-ea63fea359eb@github.com> > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Add relative link to compile.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24724/files - new: https://git.openjdk.org/jdk/pull/24724/files/86fdc673..8193767d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=02-03 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24724/head:pull/24724 PR: https://git.openjdk.org/jdk/pull/24724 From rcastanedalo at openjdk.org Thu Apr 24 10:57:41 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 10:57:41 GMT Subject: RFR: 8354520: IGV: dump contextual information [v4] In-Reply-To: References: <81rrgzCaRmmpzDY4KCVmlcKS8-a4x3jtxbrsuFtKJNo=.ddaf541a-e9a5-4bee-8712-38fde6d3d886@github.com> Message-ID: On Thu, 24 Apr 2025 09:10:45 GMT, Roberto Casta?eda Lozano wrote: >> I often find them helpful to find the files a readme is referring to, so I tend to add them. But it's up to you whether you like them as well :slightly_smiling_face: > > Fair enough, so the relative link will resolve correctly and point to the actual `compile.cpp` file on GitHub? I added the link now (commit 8193767d). Turns out GitHub would treat it, by default, as relative to the directory where the `README.md` file is placed, so I had to prefix it with `/` to make it relative to the root of the repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2058109735 From mli at openjdk.org Thu Apr 24 11:15:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 11:15:26 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v3] In-Reply-To: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: <9gjT1atmhJpOcH78nea-xe4vrLhmLAlwXT2q-GjWbg0=.a414048d-30a0-4a2b-a0dc-abdbe6b93d97@github.com> > Hi, > Can you help to review this simple patch? > It just enables some test to run on riscv. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: typo; clean ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24797/files - new: https://git.openjdk.org/jdk/pull/24797/files/eac56a87..9f666661 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24797&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24797&range=01-02 Stats: 9 lines in 3 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24797.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24797/head:pull/24797 PR: https://git.openjdk.org/jdk/pull/24797 From mli at openjdk.org Thu Apr 24 11:15:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 11:15:26 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v2] In-Reply-To: References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> <5PvVtYOh3PZVyX7cN9GR3jeVJ-9LKZ7fHefIeKQVjlc=.926a830b-f312-4a67-9684-84546afad26b@github.com> <1GAnt3Sk1bXadYA_3Q0dVLV8e8ddTZULFgSVWZYRZko=.1db15ac4-ae6e-409d-96c2-f11c44f3297d@github.com> <5HXiamN-I6OSifZO1-AHOArPIacgB3mYTjoflaM9hD4=.adc0f714-7315-4a9a-a2a2-44f52d94bf9f@github.com> Message-ID: On Thu, 24 Apr 2025 10:39:30 GMT, Fei Yang wrote: >> Thanks for catching! >> ~~To confirm, do you mean the line 111 in this file? I'll fix it.~~ >> ~~Or you suggest some fix in ProdRed_Int.java too?~~ >> I see what you meant, I'll fix it. > > I see two occurrences in file ProdRed_Double.java: Both L94 and L111. > Regarding ProdRed_Int.java, seems that we can do a similar cleanup like you do in this PR for other tests. > I mean: > > diff --git a/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java b/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java > index ebc8251e025..9ca670117cf 100644 > --- a/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java > +++ b/test/hotspot/jtreg/compiler/loopopts/superword/ProdRed_Int.java > @@ -85,8 +85,7 @@ public static void prodReductionInit(int[] a, int[] b) { > @IR(applyIfCPUFeature = {"sse4.1", "true"}, > applyIfAnd = {"SuperWordReductions", "true", "LoopMaxUnroll", ">= 8"}, > counts = {IRNode.MUL_REDUCTION_VI, ">= 1", IRNode.MUL_REDUCTION_VI, "<= 2"}) // one for main-loop, one for vector-post-loop > - @IR(applyIfPlatform = {"riscv64", "true"}, > - applyIfCPUFeature = {"rvv", "true"}, > + @IR(applyIfCPUFeature = {"rvv", "true"}, > applyIfAnd = {"SuperWordReductions", "true", "LoopMaxUnroll", ">= 8"}, > counts = {IRNode.MUL_REDUCTION_VI, ">= 1", IRNode.MUL_REDUCTION_VI, "<= 2"}) // one for main-loop, one for vector-post-loop > public static int prodReductionImplement(int[] a, int[] b, int total) { Thanks! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24797#discussion_r2058138872 From fyang at openjdk.org Thu Apr 24 11:20:01 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Apr 2025 11:20:01 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v3] In-Reply-To: <9gjT1atmhJpOcH78nea-xe4vrLhmLAlwXT2q-GjWbg0=.a414048d-30a0-4a2b-a0dc-abdbe6b93d97@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> <9gjT1atmhJpOcH78nea-xe4vrLhmLAlwXT2q-GjWbg0=.a414048d-30a0-4a2b-a0dc-abdbe6b93d97@github.com> Message-ID: On Thu, 24 Apr 2025 11:15:26 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> It just enables some test to run on riscv. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > typo; clean Updated change LGTM. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24797#pullrequestreview-2790716756 From mli at openjdk.org Thu Apr 24 11:36:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 11:36:53 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v13] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 23:54:01 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > CPUFeatures: RISC-V support Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2790760838 From mli at openjdk.org Thu Apr 24 11:36:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 11:36:54 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 23:55:50 GMT, Vladimir Ivanov wrote: >> Does the following check catch `UseRVV == false` case on RISC-V? >> >> public boolean isSupported(Operator op, VectorSpecies vspecies) { >> ... >> int maxLaneCount = VectorSupport.getMaxLaneCount(vspecies.elementType()); >> if (vspecies.length() > maxLaneCount) { >> return false; // lacking vector support >> } >> ... > > FTR both `VectorSupport.getMaxLaneCount()` and `CPUFeatures` don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by `VM_Version` and, hence, `CPUFeatures`. Thank you for updating! Looks good for riscv. I have ran some basic tests for vector API, passed. I did not ran benchmark, as riscv & aarch64 share the same way to bridge from java to sleef. > Does the following check catch UseRVV == false case on RISC-V? Yes. If you don't mind, an explicit comment might be helpful. As to me "lacking vector support" here means the vector length is not large enough, but it's quite subjective, so you are on the call. > FTR both VectorSupport.getMaxLaneCount() and CPUFeatures don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by VM_Version and, hence, CPUFeatures. I'm fine with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2058172975 From fyang at openjdk.org Thu Apr 24 11:44:50 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Apr 2025 11:44:50 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:01:42 GMT, Anjian-Wen wrote: > support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 > add C2 match rule > add related Tests in IRNode structure > > passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2129: > 2127: } > 2128: > 2129: public static final String VAND_NOTI_VX = PREFIX + "VAND_NOTI_VX" + POSTFIX; Since this is a shared file and these four instructions are RISCV-specific, seems better to add a `RISCV_` prefix to make that explicit. Consider names like `RISCV_VAND_NOTI_VX`, `RISCV_VAND_NOTL_VX`, `RISCV_VAND_NOTI_VX_MASKED` and `RISCV_VAND_NOTL_VX_MASKED`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24709#discussion_r2058181155 From duke at openjdk.org Thu Apr 24 12:09:19 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 24 Apr 2025 12:09:19 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v14] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with two additional commits since the last revision: - Add check flag for combine operator - Make MergeLoadInfoList an in-place growable array ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/a35d96e5..333b57bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=12-13 Stats: 219 lines in 3 files changed: 141 ins; 14 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From duke at openjdk.org Thu Apr 24 12:49:17 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 24 Apr 2025 12:49:17 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v2] In-Reply-To: References: Message-ID: > support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 > add C2 match rule > add related Tests in IRNode structure > > passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: add prefix for test String ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24709/files - new: https://git.openjdk.org/jdk/pull/24709/files/7b507b8d..dbdf9b87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=00-01 Stats: 12 lines in 2 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24709/head:pull/24709 PR: https://git.openjdk.org/jdk/pull/24709 From mchevalier at openjdk.org Thu Apr 24 13:24:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 13:24:15 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag Message-ID: As the title says, let's just add the flag! Sorry, my bad. It now works on my machine. Thanks, Marc ------------- Commit messages: - Add UnlockDiagnosticVMOptions to MissedOptCastII Changes: https://git.openjdk.org/jdk/pull/24849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355492 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24849/head:pull/24849 PR: https://git.openjdk.org/jdk/pull/24849 From rcastanedalo at openjdk.org Thu Apr 24 13:31:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Apr 2025 13:31:56 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 13:07:10 GMT, Marc Chevalier wrote: > As the title says, let's just add the flag! Sorry, my bad. > > It now works on my machine. > > Thanks, > Marc Looks good, and trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24849#pullrequestreview-2791208842 From chagedorn at openjdk.org Thu Apr 24 13:38:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Apr 2025 13:38:55 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 13:07:10 GMT, Marc Chevalier wrote: > As the title says, let's just add the flag! Sorry, my bad. > > It now works on my machine. > > Thanks, > Marc Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24849#pullrequestreview-2791228608 From mchevalier at openjdk.org Thu Apr 24 13:38:55 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 13:38:55 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 13:07:10 GMT, Marc Chevalier wrote: > As the title says, let's just add the flag! Sorry, my bad. > > It now works on my machine. > > Thanks, > Marc Thanks @robcasloz! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24849#issuecomment-2827653657 From mchevalier at openjdk.org Thu Apr 24 13:38:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 13:38:56 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: <6pkGEhwnOEaF1-RQeOiNXAAVh0xeQHCnWL5K2nRbJ6k=.9ef0aeda-a20f-4f1a-8dc7-f8bcb7a2158b@github.com> On Thu, 24 Apr 2025 13:34:29 GMT, Christian Hagedorn wrote: >> As the title says, let's just add the flag! Sorry, my bad. >> >> It now works on my machine. >> >> Thanks, >> Marc > > Looks good and trivial. Oh, and thanks @chhagedorn as well! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24849#issuecomment-2827657834 From duke at openjdk.org Thu Apr 24 13:38:56 2025 From: duke at openjdk.org (duke) Date: Thu, 24 Apr 2025 13:38:56 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 13:07:10 GMT, Marc Chevalier wrote: > As the title says, let's just add the flag! Sorry, my bad. > > It now works on my machine. > > Thanks, > Marc @marc-chevalier Your change (at version 3fb2bb7d559b6f6da42433fc55368defc3db7222) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24849#issuecomment-2827660645 From mchevalier at openjdk.org Thu Apr 24 13:42:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 24 Apr 2025 13:42:51 GMT Subject: Integrated: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 13:07:10 GMT, Marc Chevalier wrote: > As the title says, let's just add the flag! Sorry, my bad. > > It now works on my machine. > > Thanks, > Marc This pull request has now been integrated. Changeset: 0537c692 Author: Marc Chevalier Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/0537c6927d4f617624672cfae06928f9738175ca Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag Reviewed-by: rcastanedalo, chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24849 From thartmann at openjdk.org Thu Apr 24 13:42:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 24 Apr 2025 13:42:50 GMT Subject: RFR: 8355492: MissedOptCastII is missing UnlockDiagnosticVMOptions flag In-Reply-To: References: Message-ID: <7ly59K99NkkK14BaJiiXkzDl3JbfZHgjglfj4Onr03M=.2e28754d-6b98-4e96-bb12-224cfd6be1cd@github.com> On Thu, 24 Apr 2025 13:07:10 GMT, Marc Chevalier wrote: > As the title says, let's just add the flag! Sorry, my bad. > > It now works on my machine. > > Thanks, > Marc ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24849#pullrequestreview-2791249674 From fjiang at openjdk.org Thu Apr 24 14:27:50 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 24 Apr 2025 14:27:50 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v2] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 12:49:17 GMT, Anjian-Wen wrote: >> support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 >> add C2 match rule >> add related Tests in IRNode structure >> >> passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > add prefix for test String Thanks! Overall looks good, with one minor suggestion. src/hotspot/cpu/riscv/riscv_v.ad line 416: > 414: // vector-immediate add (unpredicated) > 415: > 416: instruct vaddI_vi(vReg dst, vReg src1, immI5 con) %{ Perhaps we can do these naming refactorings in a separate PR first. This will look cleaner. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24709#pullrequestreview-2791413687 PR Review Comment: https://git.openjdk.org/jdk/pull/24709#discussion_r2058574940 From duke at openjdk.org Thu Apr 24 14:27:50 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 24 Apr 2025 14:27:50 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v3] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge master - num_8b_elems_in_vec --> nof_vec_elems - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. - 8322174: RISC-V: C2 VectorizedHashCode RVV Version ------------- Changes: https://git.openjdk.org/jdk/pull/17413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=02 Stats: 479 lines in 6 files changed: 477 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From shade at openjdk.org Thu Apr 24 14:50:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Apr 2025 14:50:57 GMT Subject: RFR: 8355432: Remove CompileTask from SA In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:14:17 GMT, Aleksey Shipilev wrote: > A lot of SA infrastructure was added to support SA compiler replay with [JDK-7088955](https://bugs.openjdk.org/browse/JDK-7088955). With [JDK-8315488](https://bugs.openjdk.org/browse/JDK-8315488), we got rid from the most of it. `CompileTask` seems to be left behind. Nothing uses it in SA now. > > Now, for Leyden, we want to massage `CompileTask` for better performance and reliability ([JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269)), and keeping `CompileTask` in SA would require us to implement a whole bunch of complicated, but unnecessary code. > > So, it would be good to purge `CompileTask` from SA. > > Note that I left the related `vmStructs` definitions, because async-profiler uses those; I think to see which methods current compiler is compiling. That use looks safe, as it polls the task from the already set up ciEnv. async-profiler would need to re-adjust after [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269) makes relevant changes in `vmStructs`. This PR frees us from doing the same thing in SA. Thanks for reviews! I think we are good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24832#issuecomment-2827914359 From shade at openjdk.org Thu Apr 24 14:50:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Apr 2025 14:50:58 GMT Subject: Integrated: 8355432: Remove CompileTask from SA In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:14:17 GMT, Aleksey Shipilev wrote: > A lot of SA infrastructure was added to support SA compiler replay with [JDK-7088955](https://bugs.openjdk.org/browse/JDK-7088955). With [JDK-8315488](https://bugs.openjdk.org/browse/JDK-8315488), we got rid from the most of it. `CompileTask` seems to be left behind. Nothing uses it in SA now. > > Now, for Leyden, we want to massage `CompileTask` for better performance and reliability ([JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269)), and keeping `CompileTask` in SA would require us to implement a whole bunch of complicated, but unnecessary code. > > So, it would be good to purge `CompileTask` from SA. > > Note that I left the related `vmStructs` definitions, because async-profiler uses those; I think to see which methods current compiler is compiling. That use looks safe, as it polls the task from the already set up ciEnv. async-profiler would need to re-adjust after [JDK-8231269](https://bugs.openjdk.org/browse/JDK-8231269) makes relevant changes in `vmStructs`. This PR frees us from doing the same thing in SA. This pull request has now been integrated. Changeset: 0edd018a Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/0edd018a48c202a6da4afe80e245799b47000885 Stats: 73 lines in 1 file changed: 0 ins; 73 del; 0 mod 8355432: Remove CompileTask from SA Reviewed-by: cjplummer, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/24832 From qamai at openjdk.org Thu Apr 24 16:12:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 24 Apr 2025 16:12:42 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v11] In-Reply-To: References: Message-ID: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Emanuel's suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22880/files - new: https://git.openjdk.org/jdk/pull/22880/files/8d140fd9..8238894b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22880&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880 PR: https://git.openjdk.org/jdk/pull/22880 From qamai at openjdk.org Thu Apr 24 16:12:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 24 Apr 2025 16:12:46 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 18:21:47 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Reconstruct FP >> - aarch64 support >> - Merge branch 'master' into verifycast >> - assert CastLL >> - reviews >> - make the flag diagnostic >> - Merge branch 'master' into verifycast >> - draft >> - Merge branch 'master' into verifycast >> - Merge branch 'master' into verifycast >> - ... and 6 more: https://git.openjdk.org/jdk/compare/9195a811...8d140fd9 > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2763: > >> 2761: >> 2762: if (lo != min_jint && hi != max_jint) { >> 2763: subsw(rtmp, rval, lo); > > It turns out it's equivalent to `cmpw(rval, lo)` which is clearer IMO. I don't think it is, `cmpw(rval, lo)` is equivalent to `subsw(zr, rval, lo)`. However, if `lo` does not fit into an immediate instruction, `MacroAssembler::subsw`, which calls into `wrap_adds_subs_imm_insn`, will use `Rd` as a temporary register to store `lo`, this is invalid if `Rd` is `zr`. Am I understanding it right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2058796364 From qamai at openjdk.org Thu Apr 24 16:12:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 24 Apr 2025 16:12:47 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:24:06 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Reconstruct FP >> - aarch64 support >> - Merge branch 'master' into verifycast >> - assert CastLL >> - reviews >> - make the flag diagnostic >> - Merge branch 'master' into verifycast >> - draft >> - Merge branch 'master' into verifycast >> - Merge branch 'master' into verifycast >> - ... and 6 more: https://git.openjdk.org/jdk/compare/9195a811...8d140fd9 > > test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java line 28: > >> 26: * @bug 8346836 >> 27: * @requires vm.debug == true & vm.flavor == "server" >> 28: * @summary Run with -Xcomp to test -XX:+StressGCM -XX:VerifyConstraintCasts=1 in debug builds. > > Nit: you are also running it with `-XX:VerifyConstraintCasts=1`. I would just keep the summary more generic. > Suggestion: > > * @summary Empty main program to run with flag VerifyConstraintCasts. Thanks, I have done so ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2058798271 From mli at openjdk.org Thu Apr 24 16:25:49 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 16:25:49 GMT Subject: Integrated: 8355293: [TEST] RISC-V: enable more ir tests In-Reply-To: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> Message-ID: On Tue, 22 Apr 2025 11:54:58 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > It just enables some test to run on riscv. > > Thanks This pull request has now been integrated. Changeset: 862797f0 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/862797f0c16ed0459cda4931824b6b17120a2abe Stats: 43 lines in 6 files changed: 14 ins; 7 del; 22 mod 8355293: [TEST] RISC-V: enable more ir tests Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/24797 From mli at openjdk.org Thu Apr 24 16:25:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Apr 2025 16:25:48 GMT Subject: RFR: 8355293: [TEST] RISC-V: enable more ir tests [v3] In-Reply-To: References: <35acCSxvX_QmszIDigxcJ2zN3RjIGCT2Fcb9enkM-Xk=.793165e0-c4a0-45e6-aa00-7b721d25ff57@github.com> <9gjT1atmhJpOcH78nea-xe4vrLhmLAlwXT2q-GjWbg0=.a414048d-30a0-4a2b-a0dc-abdbe6b93d97@github.com> Message-ID: On Thu, 24 Apr 2025 11:17:21 GMT, Fei Yang wrote: > Updated change LGTM. Thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24797#issuecomment-2828194945 From kvn at openjdk.org Thu Apr 24 17:05:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Apr 2025 17:05:43 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: <0hoQ2BKezGCepBeBlxwfXUHw82Vk4Oia58NHyyD3KiM=.d6bcdd70-cc42-4dca-adff-0d881e60f979@github.com> References: <0hoQ2BKezGCepBeBlxwfXUHw82Vk4Oia58NHyyD3KiM=.d6bcdd70-cc42-4dca-adff-0d881e60f979@github.com> Message-ID: On Thu, 24 Apr 2025 02:18:27 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/aotCacheAccess.hpp line 38: >> >>> 36: // AOT Cache API for AOT compiler >>> 37: >>> 38: class AOTCacheAccess : AllStatic { >> >> It looks related to `AOTCodeCache`? Maybe `AOTCodeCacheAccess` then? > > This file is called https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/cds/cdsAccess.hpp in the Leyden repo and provides an abstract API for accessing contents of the AOT cache. In Leyden, we have APIs for accessing cached oops: > > > static int get_archived_object_permanent_index(oop obj) NOT_CDS_JAVA_HEAP_RETURN_(-1); > static oop get_archived_object(int permanent_index) NOT_CDS_JAVA_HEAP_RETURN_(nullptr); > > > and various pointer operations > > > static uint delta_from_shared_address_base(address addr); > > template > static void set_pointer(T** ptr, T* value) { > set_pointer((address*)ptr, (address)value); > } > static void set_pointer(address* ptr, address value); > > > Let's keep the AOTCacheAccess name for now and wait until we merge this PR down to the Leyden repo. There's some overlap between AOTCacheAccess and CDSConfig. Maybe we should do a refactor/rename later. Okay, I will keep the name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2058883661 From jbhateja at openjdk.org Thu Apr 24 17:55:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Apr 2025 17:55:59 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: <4KYVemCsJx4WaROYdA770DaFipRFOWUmlR-iGMkHkVk=.1d8ddb9b-ec6d-4fb5-828b-bc96c07ac756@github.com> References: <4KYVemCsJx4WaROYdA770DaFipRFOWUmlR-iGMkHkVk=.1d8ddb9b-ec6d-4fb5-828b-bc96c07ac756@github.com> Message-ID: On Wed, 2 Apr 2025 16:53:38 GMT, Mohamed Issa wrote: >> Please add a micro benchmark for different value ranges > >> Please add a micro benchmark for different value ranges > > @jatin-bhateja Should I add different value ranges to the existing tanh micro-benchmark or create a brand new micro-benchmark? Hi @missa-prime , Please update the PR notes with lastest benchmark results with following configurations - With and without opt - With opt -XX:DisableIntrinsc=_dtanh ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2828410491 From jbhateja at openjdk.org Thu Apr 24 17:56:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Apr 2025 17:56:01 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v7] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 16:30:49 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new input values are provided for the existing micro-benchmark and a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over the baseline. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios. >> >> | Input range(s) | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) | >> | :-------------------: | :----------------: | :----------------: | :------------------------: | >> | [-1, 1] | 22671 | 22190 | -2.12 | >> | [-2, 2] | 22680 | 22191 | -2.16 | >> | [-10, 10] | 22683 | 22149 | -2.35 | >> | [-20, 20] | 22694 | 22183 | -2.25 | >> | [-100, 100] | 29806 | 33675 | +12.98 | >> | [-1000, 1000] | 46747 | 49179 | +5.20 | >> | [-10000, 10000] ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Make regular tanh benchmark inputs constant values test/micro/org/openjdk/bench/java/lang/MathBench.java line 67: > 65: public double double1 = 1.0d, double2 = 2.0d, double81 = 81.0d, doubleNegative12 = -12.0d, double4Dot1 = 4.1d, double0Dot5 = 0.5d; > 66: > 67: public static final double tanhConstInputs[] = {-2.0, -1.0, -0.5, -0.1, 0.0, 0.1, 0.5, 1.0, 2.0}; Static final arrays have mutable elements, you should declare individual static fields with different constant values. We can also use @Stable annotation, in that case array index must be a constant. Is better to go with few individual static final fields with constant values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2058864286 From bulasevich at openjdk.org Thu Apr 24 18:30:11 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 24 Apr 2025 18:30:11 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). ARM32 part is good. jtreg hospot tests: passed. ------------- Marked as reviewed by bulasevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/24549#pullrequestreview-2792139064 From jbhateja at openjdk.org Thu Apr 24 18:43:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Apr 2025 18:43:41 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v5] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Add dynamic sized feature vectors - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - dropping unneeded feature enabling/checks - 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: https://git.openjdk.org/jdk/pull/24329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=04 Stats: 508 lines in 26 files changed: 278 ins; 13 del; 217 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Thu Apr 24 18:43:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Apr 2025 18:43:41 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v4] In-Reply-To: <3DRTEheyn6n6OYx38sL8-tqQbycO-QIjfwqwlErr5TI=.6cf2043b-f7f8-4a9d-9b59-3a844e74eaf2@github.com> References: <3DRTEheyn6n6OYx38sL8-tqQbycO-QIjfwqwlErr5TI=.6cf2043b-f7f8-4a9d-9b59-3a844e74eaf2@github.com> Message-ID: On Thu, 24 Apr 2025 01:32:19 GMT, Vladimir Ivanov wrote: > It looks much better! Thanks, Jatin. > > I'm curious why don't you represent feature bitmap as a POD (with all the accessors on it) and pass it around by value when needed? (It's size will vary across platforms, but will be fixed at runtime.) It should significantly simplify the implementation. > > As an example, take a look at `RegMask` in C2. It accommodates significantly more bits than needed for `VM_Version`. Hi @iwanowww, RegMask is part of opto code, and it may not be accessible to the JVMCI interface, Currently, JVMCI captures the native address of various fields of VM_Struct, which are of interest to Graal. In the proposed solution, we are adding a new dynamically sized feature vector whose each element is 64 bits wide. JVMCI book-keeps the dynamic feature vector and its size, then uses the UNSAFE access API to compute the enabled feature set on the Java side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2828550031 From jbhateja at openjdk.org Thu Apr 24 18:59:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Apr 2025 18:59:53 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v13] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 23:54:01 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > CPUFeatures: RISC-V support src/hotspot/share/opto/vectorIntrinsics.cpp line 563: > 561: debug_name = debug_name_oop->const_oop()->as_instance()->java_lang_String_str(buf, buflen); > 562: } > 563: Node* vcall = make_runtime_call(RC_VECTOR, By generating an upfront CallLeafVectorNode, we may miss out on performing any GVN-style optimization for trigonometric identities like the following. do you think creating a macro node which can lazily be expanded to call node during macro expansion will help. arcsin(sin(x)) => x arccos(cos(x)) => x sin(arcsin(x) => x cos(arccos(x) => x ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2059057257 From asmehra at openjdk.org Thu Apr 24 20:46:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 24 Apr 2025 20:46:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: <906foFPYK2MMXfRZN4BbdV6cgV7LwG26qTsp0_IASzw=.29a21e39-fde8-41fb-8322-30b154800261@github.com> On Thu, 24 Apr 2025 01:39:33 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > src/hotspot/share/runtime/sharedRuntime.cpp line 2966: > >> 2964: adapter_blob = AdapterHandlerLibrary::link_aot_adapter_handler(this); >> 2965: if (adapter_blob == nullptr) { >> 2966: log_warning(cds)("Failed to link AdapterHandlerEntry (fp=%s) to its code in the AOT code cache", _fingerprint->as_basic_args_string()); > > Doesn't it add noise in the output for not yet seen adapter shapes? It's a warning. `AdapterHandlerEntry::link` method gets called only for the archived adapters. Not yet seen adapters do not come across this code path. Reason for making it a warning is the assumption that, if the AOT Code Cache is usable, then we should always be able to link the `AdapterHandlerEntry` to its code in the AOT code cache. If we fail to do so, then something is not right in this machinery and likely pointing to a bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2059204247 From dlong at openjdk.org Thu Apr 24 21:23:44 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Apr 2025 21:23:44 GMT Subject: RFR: 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' In-Reply-To: References: Message-ID: <7BL1_Vc2E1_sywDOvpD-AWZLa2x60kynBke_Q-C6vz8=.becc41ce-71fc-4458-8100-6a0880e96cc5@github.com> On Thu, 24 Apr 2025 06:55:54 GMT, Marc Chevalier wrote: > We have a UB when the shift is equal to or bigger than the number of bits in > the type. Our expression is > > (julong)CONST64(1) << (julong)(BitsPerJavaLong - shift) > > so we have a UB when the RHS is `>= 64`, that is when `shift` is `<= 0`. Since shift is masked to > be in `[0, BitPerJavaLong - 1]`, we actually have a UB when `shift == 0`. The > code doesn't forbid it, indeed, and it doesn't seem to be enforced by more global > invariants. > > This UB doesn't reproduce anymore with the provided cases. > I've replaced the UB with an explicit assert to try to find another failing > case. No hit when run with tier1, tier2, tier3, hs-precheckin-comp and hs-comp-stress. > > Nevertheless, the assert indeed hit on the master of when the issue was filed. > More precisely, I've bisect for the two tests `java/foreign/StdLibTest.java` > and `java/lang/invoke/PermuteArgsTest.java` and the assert hits until > [8339196: Optimize BufWriterImpl#writeU1/U2/Int/Long](https://bugs.openjdk.org/browse/JDK-8339196). > > It is not clear to me why the issue stopped reproducing after this commit, but given > the lack of reproducer, I went with a semi-blind fix: it fixes the issue back then, > and still removes a chance of UB. It simply makes sure the RHS of this shift cannot be > 64 by making sure `shift` cannot be 0. > > If `shift` is indeed 0, since it is the RHS of a `RShiftLNode`, `RShiftLNode::Identity` > should simply returns the LHS of the shift, and entirely eliminate the RShiftLNode. > > The implementation of `AndINode::Ideal` is, on the other hand, safe. Indeed, it uses > `right_n_bits(BitsPerJavaInteger - shift)` instead of doing manually > `(1 << (BitsPerJavaInteger - shift)) - 1`. This macro is safe as it would return `-1` if > the shift is too large rather than causing a UB. Yet, I didn't use this way since it would > cause the replacement of `AndI(X, RShiftI(Y, 0))` by `AndI(X, URShiftI(Y, 0))` before > simplifying the `URShiftI` into `Y`. In between, it also implies that all users of the > And node will be enqueued for IGVN for a not-very-interesting change. Simply skipping > the replacement of RShiftL into URShiftL allows to directly come to `AndL(X, Y)` without > useless steps. > > Thanks, > Marc I agree -- this seems like the best fix. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24841#pullrequestreview-2792549527 From dlong at openjdk.org Thu Apr 24 21:38:49 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Apr 2025 21:38:49 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: <43xT9PWVHBKJVQ0fgZ1-kJIlkahhZgS9bxUiQL8z2xw=.3fca6b8f-40fd-41cb-84cf-0661303b25b1@github.com> On Wed, 23 Apr 2025 10:58:54 GMT, Marc Chevalier wrote: > The double `(double)count * prof_factor * method_life / counter_life + 0.5` > can overflow a 32-bit int, causing UB on casting, but in practice computing > a wrong scale, probably. > > We just need to compare that the cast is not going to overflow. This is possible > because `INT_MAX` is exactly representable in a `double`. It is also good to > notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` > cannot overflow a `double`: > - `count` is a int, max value = 2^31-1 < 2.2e9 > - `method_lie` is a int, max value < 2.2e9 > - `prof_factor` is a float, max value < 3.5e38 > - `counter_life` is a int, positive at this point, so min value = 1 > So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the > max value of a double (about 1.8e308). We probably would have precision issues, but > it probably doesn't matter a lot. > > The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. > > Thanks, > Marc Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24824#pullrequestreview-2792574919 From vlivanov at openjdk.org Thu Apr 24 22:17:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 22:17:51 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v4] In-Reply-To: References: <3DRTEheyn6n6OYx38sL8-tqQbycO-QIjfwqwlErr5TI=.6cf2043b-f7f8-4a9d-9b59-3a844e74eaf2@github.com> Message-ID: On Thu, 24 Apr 2025 18:39:04 GMT, Jatin Bhateja wrote: > RegMask is part of opto code, and it may not be accessible to the JVMCI interface I'm not suggesting to reuse RegMask, but introduce a separate class (e.g., VMFeatures) and embed its instances into Abstract_VM_Version (as `VMFeatures _features` and `VMFeatures _cpu_features`). You can keep all the accessors and bit manipulation logic on `VMFeatures` class. JVMCI can still operate on in-memory representation at `Abstract_VM_Version::_features`. But it now needs to query its size (which becomes platform-specific constant). (BTW all CPU feature constants in `AMD64HotSpotVMConfig` change their meaning. I don't see any usages in JDK code. Should they go away now?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2828981702 From dhanalla at openjdk.org Thu Apr 24 22:20:58 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 24 Apr 2025 22:20:58 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v7] In-Reply-To: References: <18TQt6vxN9KxSVwyeQtAWde-ezaVuUEioAl_5_3sAeE=.e5e76fb6-04a7-4f6f-9377-f1e64837ada6@github.com> <-Fs-Nim4P8TQMnjE9bs2HBY34vQtzhzH2dsU7MDlZrI=.34991658-bed4-46ac-b213-f4988c0f9c8b@github.com> Message-ID: On Fri, 7 Feb 2025 16:36:23 GMT, Emanuel Peter wrote: >>> @dhanalla Would you like this to be reviewed? We generally don't re-review until we get pinged again. The idea is that you are maybe still working on it, and so there is no point in reviewing half-processed code. So once you are happy, you can let us know ;) >> Thanks, @eme64 for checking with me. Yes, it's ready for review. > > @dhanalla Testing failed for this test: > `compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java` > With flags: > - `-server -Xcomp` > - `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > - `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline` > - `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing` > > We also have an internal test that is failing with the same assert: > > `# assert(jobj != nullptr && jobj != phantom_obj) failed: escaped allocation` @eme64, could you please take a look at this PR? Let me know if there's anything else I should address. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21270#issuecomment-2828985852 From vlivanov at openjdk.org Thu Apr 24 23:06:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:06:50 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v13] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 18:57:11 GMT, Jatin Bhateja wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> CPUFeatures: RISC-V support > > src/hotspot/share/opto/vectorIntrinsics.cpp line 563: > >> 561: debug_name = debug_name_oop->const_oop()->as_instance()->java_lang_String_str(buf, buflen); >> 562: } >> 563: Node* vcall = make_runtime_call(RC_VECTOR, > > By generating an upfront CallLeafVectorNode, we may miss out on performing any GVN-style optimization for trigonometric identities like the following. do you think creating a macro node which can lazily be expanded to call node during macro expansion will help. > > arcsin(sin(x)) => x > arccos(cos(x)) => x > sin(arcsin(x) => x > cos(arccos(x) => x It does look attractive, but macro expansion-based solution requires JVM to internalize such operations and their properties. IMO a higher-level solution based on more generic JVM primitives would enable libraries to properly annotate their operations in Java bytecodes/class files, so C2 can perform such type of transformations without the need to intrinsify each individual operation first. (Think of [JDK-8218414](https://bugs.openjdk.org/browse/JDK-8218414) / [JDK-8347901](https://bugs.openjdk.org/browse/JDK-8347901) on steroids.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2059356271 From vlivanov at openjdk.org Thu Apr 24 23:16:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:16:47 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 16:07:48 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2763: >> >>> 2761: >>> 2762: if (lo != min_jint && hi != max_jint) { >>> 2763: subsw(rtmp, rval, lo); >> >> It turns out it's equivalent to `cmpw(rval, lo)` which is clearer IMO. > > I don't think it is, `cmpw(rval, lo)` is equivalent to `subsw(zr, rval, lo)`. However, if `lo` does not fit into an immediate instruction, `MacroAssembler::subsw`, which calls into `wrap_adds_subs_imm_insn`, will use `Rd` as a temporary register to store `lo`, this is invalid if `Rd` is `zr`. Am I understanding it right? Yes, you are right. Completely forgot that there's only 12 bits available for the immediate (`Assembler::operand_valid_for_add_sub_immediate()`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2059361881 From vlivanov at openjdk.org Thu Apr 24 23:16:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:16:48 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:12:46 GMT, Vladimir Ivanov wrote: >> I don't think it is, `cmpw(rval, lo)` is equivalent to `subsw(zr, rval, lo)`. However, if `lo` does not fit into an immediate instruction, `MacroAssembler::subsw`, which calls into `wrap_adds_subs_imm_insn`, will use `Rd` as a temporary register to store `lo`, this is invalid if `Rd` is `zr`. Am I understanding it right? > > Yes, you are right. Completely forgot that there's only 12 bits available for the immediate (`Assembler::operand_valid_for_add_sub_immediate()`). Completely forgot what I was thinking about when writing that code :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22880#discussion_r2059362742 From vlivanov at openjdk.org Thu Apr 24 23:20:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:20:51 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v11] In-Reply-To: References: Message-ID: <_ZjeRB-yDt62rEKh7SOhvqWD-jR112ZYi8wY-r0nmos=.c3d90f95-f22e-4a04-84a0-19e3fa4db94f@github.com> On Thu, 24 Apr 2025 16:12:42 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's suggestion Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22880#pullrequestreview-2792710891 From vlivanov at openjdk.org Thu Apr 24 23:20:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:20:52 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:27:04 GMT, Emanuel Peter wrote: > Just out of curiosity: Is the whole reconstruct_frame_pointer mechanism general enough so that we could use it in other places as well? IMO it is general enough, but I haven't found a good place to put it. Ideally, all runtime/debug calls should keep frame pointer valid for diagnostic purposes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2829055754 From duke at openjdk.org Thu Apr 24 23:25:11 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 24 Apr 2025 23:25:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix branch range check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/5552a860..1c6db6c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=11-12 Stats: 11 lines in 2 files changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Apr 24 23:25:11 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 24 Apr 2025 23:25:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v12] In-Reply-To: <0TXIxgeQppMmgi8csVPUjNtFHj5QNqv6ZewL3eYc2TU=.89631480-490b-4fb7-882f-bb7be7f8e60d@github.com> References: <0TXIxgeQppMmgi8csVPUjNtFHj5QNqv6ZewL3eYc2TU=.89631480-490b-4fb7-882f-bb7be7f8e60d@github.com> Message-ID: On Mon, 21 Apr 2025 23:55:02 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: > > - Add null check in StressNMethodRelocation > - Hold Compile_lock > - Use methodHandle for VM_Operation so pointer is not stale I?ve made several adjustments to address the key concerns. The relocation logic now avoids copying stale or invalid metadata by using MethodHandles to maintain valid references. This approach holds the corresponding method for the nmethod being relocated, which should be safe since we only relocate Java methods that have associated method objects. I?ve also added safety checks, including one to prevent relocating unloading nmethods. The relocation is now ?opt-in,? creating a new nmethod and copying only the necessary values, which should mitigate the risk of unintentionally carrying over invalid state. I?m still investigating the JVMCI-specific behavior, particularly how the nmethod mirror is handled, and will follow up once I have a clearer understanding. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2829059302 From vlivanov at openjdk.org Thu Apr 24 23:29:28 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:29:28 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v14] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/585312ae..541c4d7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Thu Apr 24 23:29:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 23:29:29 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v12] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 11:33:43 GMT, Hamlin Li wrote: >> FTR both `VectorSupport.getMaxLaneCount()` and `CPUFeatures` don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by `VM_Version` and, hence, `CPUFeatures`. > > Thank you for updating! Looks good for riscv. I have ran some basic tests for vector API, passed. I did not ran benchmark, as riscv & aarch64 share the same way to bridge from java to sleef. > >> Does the following check catch UseRVV == false case on RISC-V? > > Yes. If you don't mind, an explicit comment might be helpful. As to me "lacking vector support" here means the vector length is not large enough, but it's quite subjective, so you are on the call. > >> FTR both VectorSupport.getMaxLaneCount() and CPUFeatures don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by VM_Version and, hence, CPUFeatures. > > I'm fine with this. Thanks, I added some clarifications in the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2059367624 From dlong at openjdk.org Thu Apr 24 23:52:01 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Apr 2025 23:52:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:25:11 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix branch range check src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 940: > 938: > 939: static bool reachable_from_branch_at(address branch, address target, bool use_max=false) { > 940: return uabs(target - branch) < (use_max ? max_branch_range : branch_range); This might be the wrong approach. Using the max range will make the assert below fail less often in debug builds, but disables the stress feature of using the shorter 2M range. src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 86: > 84: if (!Assembler::reachable_from_branch_at(addr(), x, true)) { > 85: address trampoline = call->get_trampoline(); > 86: assert(trampoline != nullptr, "branch is too large with no available trampoline"); Doesn't this mean any nmethod relocation can cause the JVM to crash if there is no trampoline? Or are you creating new trampoline stubs in the relocated destination as needed (and deleting trampoline stubs that are no longer needed)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2059381735 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2059380801 From kvn at openjdk.org Thu Apr 24 23:52:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Apr 2025 23:52:59 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 00:49:23 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > src/hotspot/share/cds/aotCacheAccess.hpp line 40: > >> 38: class AOTCacheAccess : AllStatic { >> 39: public: >> 40: static void* allocate_aot_code(size_t size) NOT_CDS_RETURN_(nullptr); > > "allocate_aot_code_region", "get_aot_code_region_size", and "map_aot_code_region" would be clearer. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2059381865 From sviswanathan at openjdk.org Fri Apr 25 00:13:49 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Apr 2025 00:13:49 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v6] In-Reply-To: <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <1mKv_sNaaGW-z57wW0xpHdWjd974WzEkuMKEh0su-hY=.24d0373e-5674-45a2-bc83-490e71a7dec0@github.com> Message-ID: On Mon, 14 Apr 2025 13:50:17 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Comment refinement test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 371: > 369: IntVector vec1 = IntVector.fromArray(ispec, int_in1, i); > 370: IntVector vec2 = IntVector.fromArray(ispec, int_in2, i); > 371: // UMaxV (UMinV vec2, vec1) (UMaxV vec1, vec2) => UMinV vec1 vec2 The above comment should be: // UMaxV (UMinV vec2, vec1) (UMaxV vec1, vec2) => UMaxV vec1 vec2 test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 452: > 450: IntVector vec1 = IntVector.fromArray(ispec, int_in1, i); > 451: IntVector vec2 = IntVector.fromArray(ispec, int_in2, i); > 452: // UMaxV (UMinV vec1, vec2) (UMaxV vec2, vec1) => UMinV vec1 vec2 The above comment should be: // UMaxV (UMinV vec1, vec2) (UMaxV vec2, vec1) => UMaxV vec1 vec2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2059390266 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2059392536 From duke at openjdk.org Fri Apr 25 00:14:51 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 25 Apr 2025 00:14:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: <-Y_yLYd2I_ZJk0kcEF8ZqyVKtI0OiXT0WFtvHLhUWJU=.be99c2c6-e27d-448d-910d-ee5dfc97e12d@github.com> On Thu, 24 Apr 2025 23:47:32 GMT, Dean Long wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix branch range check > > src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 86: > >> 84: if (!Assembler::reachable_from_branch_at(addr(), x, true)) { >> 85: address trampoline = call->get_trampoline(); >> 86: assert(trampoline != nullptr, "branch is too large with no available trampoline"); > > Doesn't this mean any nmethod relocation can cause the JVM to crash if there is no trampoline? Or are you creating new trampoline stubs in the relocated destination as needed (and deleting trampoline stubs that are no longer needed)? >From what I saw calls are _usually_ accompanied by a corresponding trampoline. I'll look deeper to see if _usually_ is _always_ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2059393546 From duke at openjdk.org Fri Apr 25 00:31:08 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 25 Apr 2025 00:31:08 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v8] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new input values are provided for the existing micro-benchmark and a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over _baseline1_. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios. > > | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | > | :-------------------: | :-----------------: | :----------------: | :-------------------------: | > | [-1, 1] | 22671 | 22190 | -2.12 | > | [-2, 2] | 22680 | 22191 | -2.16 | > | [-10, 10] | 22683 | 22149 | -2.35 | > | [-20, 20] | 22694 | 22183 | -2.25 | > | [-100, 100] | 29806 | 33675 | +12.98 | > | [-1000, 1000] ... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Switch to constant double fields with separate micro-benchmarks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/64abe5f7..66be269e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=06-07 Stats: 31 lines in 1 file changed: 25 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From fyang at openjdk.org Fri Apr 25 00:31:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 25 Apr 2025 00:31:48 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v14] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:29:28 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Improve comments src/hotspot/cpu/riscv/riscv.ad line 1947: > 1945: // Vector calling convention not yet implemented. > 1946: bool Matcher::supports_vector_calling_convention(void) { > 1947: return EnableVectorSupport; You might want to remove the use of `UseVectorStubs` in `Matcher::vector_return_value` at L1951. assert(EnableVectorSupport && UseVectorStubs, "sanity"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2059390324 From kvn at openjdk.org Fri Apr 25 00:36:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Apr 2025 00:36:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 01:37:12 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > src/hotspot/share/code/aotCodeCache.cpp line 645: > >> 643: return false; >> 644: } >> 645: log_info(aot, codecache, stubs)("Writing blob '%s' to AOT Code Cache", name); > > I'd revisit logging code in AOTCodeCache and downgrade info->debug and debug->trace where appropriate. It feels too low-level most of the time. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2059406950 From kvn at openjdk.org Fri Apr 25 00:47:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Apr 2025 00:47:52 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 01:51:40 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > src/hotspot/share/code/aotCodeCache.cpp line 62: > >> 60: } >> 61: >> 62: static void exit_vm_on_store_failure() { > > It's a bit confusing to see `exit_vm_on_load_failure()` and `exit_vm_on_store_failure()` to silently proceed unless a flag is explicitly specified. > > Moreover, how reliable `AOTAdapterCaching = false` to fail-fast and avoid repreated load/store attempts? At least, I see that `AOTCodeCache` ctor cache `AOTAdapterCaching`, so it won't see the update. > > How does it affect adapter code generation during assembly phase? We check failure state of AOT code cache when query about using adapters caching: bool for_use() const { return _for_use && !_failed; } bool for_dump() const { return _for_dump && !_failed; } static bool is_on() CDS_ONLY({ return _cache != nullptr && !_cache->closing(); }) NOT_CDS_RETURN_(false); static bool is_on_for_use() { return is_on() && _cache->for_use(); } static bool is_on_for_dump() { return is_on() && _cache->for_dump(); } static bool is_dumping_adapters() { return is_on_for_dump() && _cache->adapter_caching(); } static bool is_using_adapters() { return is_on_for_use() && _cache->adapter_caching(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2059412190 From duke at openjdk.org Fri Apr 25 00:50:48 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 25 Apr 2025 00:50:48 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: <4KYVemCsJx4WaROYdA770DaFipRFOWUmlR-iGMkHkVk=.1d8ddb9b-ec6d-4fb5-828b-bc96c07ac756@github.com> References: <4KYVemCsJx4WaROYdA770DaFipRFOWUmlR-iGMkHkVk=.1d8ddb9b-ec6d-4fb5-828b-bc96c07ac756@github.com> Message-ID: On Wed, 2 Apr 2025 16:53:38 GMT, Mohamed Issa wrote: >> Please add a micro benchmark for different value ranges > >> Please add a micro benchmark for different value ranges > > @jatin-bhateja Should I add different value ranges to the existing tanh micro-benchmark or create a brand new micro-benchmark? > Hi @missa-prime , Please update the PR notes with lastest benchmark results with following configurations > > * With and without opt > > * With opt -XX:DisableIntrinsc=_dtanh Ok, I updated the PR description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2829140885 From duke at openjdk.org Fri Apr 25 00:50:49 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 25 Apr 2025 00:50:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v7] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 16:51:06 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Make regular tanh benchmark inputs constant values > > test/micro/org/openjdk/bench/java/lang/MathBench.java line 67: > >> 65: public double double1 = 1.0d, double2 = 2.0d, double81 = 81.0d, doubleNegative12 = -12.0d, double4Dot1 = 4.1d, double0Dot5 = 0.5d; >> 66: >> 67: public static final double tanhConstInputs[] = {-2.0, -1.0, -0.5, -0.1, 0.0, 0.1, 0.5, 1.0, 2.0}; > > Static final arrays have mutable elements, you should declare individual static fields with different constant values. > We can also use @Stable annotation, in that case array index must be a constant. Is better to go with few individual static final fields with constant values. I switched to individual static final fields with constant values. There are separate micro-benchmarks included as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2059413513 From kvn at openjdk.org Fri Apr 25 00:51:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Apr 2025 00:51:53 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 00:44:46 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/code/aotCodeCache.cpp line 62: >> >>> 60: } >>> 61: >>> 62: static void exit_vm_on_store_failure() { >> >> It's a bit confusing to see `exit_vm_on_load_failure()` and `exit_vm_on_store_failure()` to silently proceed unless a flag is explicitly specified. >> >> Moreover, how reliable `AOTAdapterCaching = false` to fail-fast and avoid repreated load/store attempts? At least, I see that `AOTCodeCache` ctor cache `AOTAdapterCaching`, so it won't see the update. >> >> How does it affect adapter code generation during assembly phase? > > We check failure state of AOT code cache when query about using adapters caching: > > bool for_use() const { return _for_use && !_failed; } > bool for_dump() const { return _for_dump && !_failed; } > static bool is_on() CDS_ONLY({ return _cache != nullptr && !_cache->closing(); }) NOT_CDS_RETURN_(false); > static bool is_on_for_use() { return is_on() && _cache->for_use(); } > static bool is_on_for_dump() { return is_on() && _cache->for_dump(); } > > > > static bool is_dumping_adapters() { return is_on_for_dump() && _cache->adapter_caching(); } > static bool is_using_adapters() { return is_on_for_use() && _cache->adapter_caching(); } AOT adapters code caching and loading is guarded by these methods not by flag. Setting AOTAdapterCaching to false on failure is simple indication that adapter caching is switched off for someone who will look on final state of flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2059413988 From kvn at openjdk.org Fri Apr 25 00:54:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Apr 2025 00:54:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: <2q_jgLrJim5ntOfr3awHdl1HTgrWtcTdcHWsO_CfnHU=.7958f1ef-9792-416a-9474-33f776e01fb5@github.com> On Fri, 25 Apr 2025 00:48:43 GMT, Vladimir Kozlov wrote: >> We check failure state of AOT code cache when query about using adapters caching: >> >> bool for_use() const { return _for_use && !_failed; } >> bool for_dump() const { return _for_dump && !_failed; } >> static bool is_on() CDS_ONLY({ return _cache != nullptr && !_cache->closing(); }) NOT_CDS_RETURN_(false); >> static bool is_on_for_use() { return is_on() && _cache->for_use(); } >> static bool is_on_for_dump() { return is_on() && _cache->for_dump(); } >> >> >> >> static bool is_dumping_adapters() { return is_on_for_dump() && _cache->adapter_caching(); } >> static bool is_using_adapters() { return is_on_for_use() && _cache->adapter_caching(); } > > AOT adapters code caching and loading is guarded by these methods not by flag. > > Setting AOTAdapterCaching to false on failure is simple indication that adapter caching is switched off for someone who will look on final state of flag. I added `log_info()` to `exit_vm_on_*_failure()` methods to produce notification when AbortVMOnAOTCodeFailure flag is off (default value). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2059415739 From duke at openjdk.org Fri Apr 25 01:46:49 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 01:46:49 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v2] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:22:34 GMT, Feilong Jiang wrote: >> Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> add prefix for test String > > src/hotspot/cpu/riscv/riscv_v.ad line 416: > >> 414: // vector-immediate add (unpredicated) >> 415: >> 416: instruct vaddI_vi(vReg dst, vReg src1, immI5 con) %{ > > Perhaps we can do these naming refactorings in a separate PR first. This will look cleaner. thanks, that's a good idea, I'll split it into two patch ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24709#discussion_r2059442446 From duke at openjdk.org Fri Apr 25 02:06:13 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 02:06:13 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v3] In-Reply-To: References: Message-ID: > support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 > add C2 match rule > add related Tests in IRNode structure > > passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: split the patch into cleanformat and enable zvbb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24709/files - new: https://git.openjdk.org/jdk/pull/24709/files/dbdf9b87..f08705b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=01-02 Stats: 146 lines in 1 file changed: 0 ins; 0 del; 146 mod Patch: https://git.openjdk.org/jdk/pull/24709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24709/head:pull/24709 PR: https://git.openjdk.org/jdk/pull/24709 From qamai at openjdk.org Fri Apr 25 02:13:11 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Apr 2025 02:13:11 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:18:08 GMT, Vladimir Ivanov wrote: >> Wow, this looks even much better with the improved printing on failure now! >> >> Just out of curiosity: Is the whole `reconstruct_frame_pointer` mechanism general enough so that we could use it in other places as well? It is not super important to me any more, but I've wanted to have something like this for `VerifyAlignVector` already :) > >> Just out of curiosity: Is the whole reconstruct_frame_pointer mechanism general enough so that we could use it in other places as well? > > IMO it is general enough, but I haven't found a good place to put it. Ideally, all runtime/debug calls should keep frame pointer valid for diagnostic purposes. @iwanowww @eme64 Thanks a lot for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2829218102 From qamai at openjdk.org Fri Apr 25 02:13:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Apr 2025 02:13:12 GMT Subject: Integrated: 8346836: C2: Verify CastII/CastLL bounds at runtime In-Reply-To: References: Message-ID: On Wed, 25 Dec 2024 14:54:02 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. This pull request has now been integrated. Changeset: ed604038 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/ed604038ffc4ca64113984324dde71c07f046b52 Stats: 372 lines in 10 files changed: 371 ins; 0 del; 1 mod 8346836: C2: Verify CastII/CastLL bounds at runtime Co-authored-by: Vladimir Ivanov Reviewed-by: vlivanov, epeter ------------- PR: https://git.openjdk.org/jdk/pull/22880 From qamai at openjdk.org Fri Apr 25 02:16:09 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Apr 2025 02:16:09 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v10] In-Reply-To: References: Message-ID: <_H-xWRwlJrRWrGdmqZF_RXu0vUD_foXf7HCcph5qQas=.ed8dedd5-7cb3-4110-a0aa-61b8d6b2cad0@github.com> On Thu, 24 Apr 2025 23:18:08 GMT, Vladimir Ivanov wrote: > IMO it is general enough, but I haven't found a good place to put it. Ideally, all runtime/debug calls should keep frame pointer valid for diagnostic purposes. Tbh I don't understand ... How does the VM normally walk the stack when we crash in a compiled method? I thought that it has to know the frame size of a compiled method? Why do we need to manually reconstruct the frame pointer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2829223189 From duke at openjdk.org Fri Apr 25 02:23:01 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 02:23:01 GMT Subject: RFR: 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad Message-ID: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad ------------- Commit messages: - RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad Changes: https://git.openjdk.org/jdk/pull/24865/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24865&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355562 Stats: 146 lines in 1 file changed: 0 ins; 0 del; 146 mod Patch: https://git.openjdk.org/jdk/pull/24865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24865/head:pull/24865 PR: https://git.openjdk.org/jdk/pull/24865 From duke at openjdk.org Fri Apr 25 02:23:47 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 02:23:47 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v3] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 02:06:13 GMT, Anjian-Wen wrote: >> support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 >> add C2 match rule >> add related Tests in IRNode structure >> >> passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > split the patch into cleanformat and enable zvbb here is the another related pr for clean up https://github.com/openjdk/jdk/pull/24865 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24709#issuecomment-2829231235 From duke at openjdk.org Fri Apr 25 02:34:19 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 02:34:19 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v4] In-Reply-To: References: Message-ID: > support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 > add C2 match rule > add related Tests in IRNode structure > > passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into Vector_And-Not_vx - split the patch into cleanformat and enable zvbb - add prefix for test String - modify format change some '_imm' to '_vi' - modify v0.t to $v0 - add format fix - RISC-V: Support Zvbb Vector And-Not vx and add its tests Support Zvbb Vector And-Not vx and add its tests modify name of the match rule and test name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24709/files - new: https://git.openjdk.org/jdk/pull/24709/files/f08705b7..a99d32ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24709&range=02-03 Stats: 36200 lines in 1014 files changed: 28117 ins; 5849 del; 2234 mod Patch: https://git.openjdk.org/jdk/pull/24709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24709/head:pull/24709 PR: https://git.openjdk.org/jdk/pull/24709 From fyang at openjdk.org Fri Apr 25 02:34:19 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 25 Apr 2025 02:34:19 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v4] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 02:29:10 GMT, Anjian-Wen wrote: >> support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 >> add C2 match rule >> add related Tests in IRNode structure >> >> passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into Vector_And-Not_vx > - split the patch into cleanformat and enable zvbb > - add prefix for test String > - modify format > > change some '_imm' to '_vi' > - modify v0.t to $v0 > - add format fix > - RISC-V: Support Zvbb Vector And-Not vx and add its tests > > Support Zvbb Vector And-Not vx and add its tests > modify name of the match rule and test name Updated change looks fine. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24709#pullrequestreview-2792871684 From fyang at openjdk.org Fri Apr 25 02:35:45 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 25 Apr 2025 02:35:45 GMT Subject: RFR: 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad In-Reply-To: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> References: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> Message-ID: <7CZHgjlq5JxeEDbSKKfcLmS9YLgKToiVP32jZjd6Jq8=.e06f2f50-53f2-4993-aa9e-f993d502ce38@github.com> On Fri, 25 Apr 2025 02:15:57 GMT, Anjian-Wen wrote: > RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad Looks good. Thanks for the cleanup. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24865#pullrequestreview-2792874869 From xgong at openjdk.org Fri Apr 25 02:36:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 25 Apr 2025 02:36:59 GMT Subject: RFR: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:58:34 GMT, Xiaohong Gong wrote: > ### Summary: > [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. > > ### Background: > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). > > The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. > > Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: > > 1. `MaxVectorSize = 16, byte_vector_size = 16`: > - Can load 4 indices per vector register > - So can finish 4 bytes per gather-load operation > - Requires 4 times of gather-loads and final merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] > int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 4 gather-load: > idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] > idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] > idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] > idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] > merge: v = [jlkp bhga cfhf becd] > ``` > > 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: > - Can load 8 indices per vector register > - So can finish 8 bytes per gather-load operation > - Requires 2 times of gather-loads and merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] > int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 2 gather-load: > idx_v1 = [2 5 7 5 1 4 2 3] > idx_v2 = [9 11 10 15 1 7 6 0] > gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] > gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] > merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] > ``` > > 3. `MaxVectorSize = 64, byte_vector_size = MaxVectorSize / 4`: > - Can load 16 indices per vector register > - So can ... I?d like to close this PR and split the change with two new PRs. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24679#issuecomment-2829245185 From fjiang at openjdk.org Fri Apr 25 02:36:59 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 25 Apr 2025 02:36:59 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v4] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 02:34:19 GMT, Anjian-Wen wrote: >> support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 >> add C2 match rule >> add related Tests in IRNode structure >> >> passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into Vector_And-Not_vx > - split the patch into cleanformat and enable zvbb > - add prefix for test String > - modify format > > change some '_imm' to '_vi' > - modify v0.t to $v0 > - add format fix > - RISC-V: Support Zvbb Vector And-Not vx and add its tests > > Support Zvbb Vector And-Not vx and add its tests > modify name of the match rule and test name Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24709#pullrequestreview-2792876093 From xgong at openjdk.org Fri Apr 25 02:37:00 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 25 Apr 2025 02:37:00 GMT Subject: Withdrawn: 8351623: VectorAPI: Refactor subword gather load and add SVE implementation In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:58:34 GMT, Xiaohong Gong wrote: > ### Summary: > [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures. > > ### Background: > Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]). > > The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given `MaxVectorSize` constraints, the operation may need to be splitted into multiple parts. > > Using a 128-bit byte vector gather load as an example, there are four scenarios with different `MaxVectorSize`: > > 1. `MaxVectorSize = 16, byte_vector_size = 16`: > - Can load 4 indices per vector register > - So can finish 4 bytes per gather-load operation > - Requires 4 times of gather-loads and final merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] > int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 4 gather-load: > idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] > idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] > idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] > idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] > merge: v = [jlkp bhga cfhf becd] > ``` > > 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: > - Can load 8 indices per vector register > - So can finish 8 bytes per gather-load operation > - Requires 2 times of gather-loads and merge > Example: > ``` > byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] > int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] > > 2 gather-load: > idx_v1 = [2 5 7 5 1 4 2 3] > idx_v2 = [9 11 10 15 1 7 6 0] > gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] > gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] > merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] > ``` > > 3. `MaxVectorSize = 64, byte_vector_size = MaxVectorSize / 4`: > - Can load 16 indices per vector register > - So can ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24679 From fjiang at openjdk.org Fri Apr 25 02:38:45 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 25 Apr 2025 02:38:45 GMT Subject: RFR: 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad In-Reply-To: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> References: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> Message-ID: On Fri, 25 Apr 2025 02:15:57 GMT, Anjian-Wen wrote: > RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad Looks fine, thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24865#pullrequestreview-2792877843 From vlivanov at openjdk.org Fri Apr 25 05:21:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 05:21:54 GMT Subject: RFR: 8346836: C2: Verify CastII/CastLL bounds at runtime [v11] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 16:12:42 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's suggestion JVM knows how to unwind the stack when crash happens in compiled code (compiled frame on top). When native frame is on top, it relies on platform ABI, so fails to unwind the stack at the border of native and compiled frames because compiled code doesn't follow platform ABI conventions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2829399402 From vlivanov at openjdk.org Fri Apr 25 05:26:35 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 05:26:35 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v15] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Remove UseVectorStubs usage in riscv.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/541c4d7f..f4373e41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Fri Apr 25 05:26:35 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 05:26:35 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v14] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 00:06:28 GMT, Fei Yang wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve comments > > src/hotspot/cpu/riscv/riscv.ad line 1947: > >> 1945: // Vector calling convention not yet implemented. >> 1946: bool Matcher::supports_vector_calling_convention(void) { >> 1947: return EnableVectorSupport; > > You might also want to remove the use of `UseVectorStubs` in `Matcher::vector_return_value` at L1951. > > assert(EnableVectorSupport && UseVectorStubs, "sanity"); Good catch. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2059577657 From epeter at openjdk.org Fri Apr 25 06:13:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 25 Apr 2025 06:13:57 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: <5mlJKeQT322VSXQ2IZIqmc0Uk79E9P3yDsW9QAZ0ca8=.cec64ec6-03d1-442e-9874-e73e6ac775dc@github.com> On Tue, 8 Apr 2025 20:21:34 GMT, Dhamoder Nalla wrote: >> This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations. >> >> >> **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.** >> >> **1. Initial State (Before Transformation)** >> The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node. >> >> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) >> >> **2. After Splitting Through Child Phi** >> The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input. >> >> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) >> >> **3. After Splitting Load Field Through Parent Phi** >> The field load operation (Load) is pushed even further up in the graph. >> >> Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging. >> >> This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement. >> >> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) >> >> ### JMH Benchmark Results: >> >> #### With Disabled RAM >> >> | Benchmark | Mode | Count | Score | Error | Units | >> |-----------|------|-------|-------|-------|-------| >> | testBailOut_runner | avgt | 15 | 13.969 | ? 0.248 | ms/op | >> | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ? 4.306 | ms/op | >> | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ? 1.781 | ms/op | >> | testMultiParentPhi_runner | avgt | 15 | 2.983 | ? 0.001 | ms/op | >> | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ? 0.731 | ms/op | >> | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ? 0.443 | ms/op | >> | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ? 1.212 | ms/op | >> | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ? 1.478 | ms/op | >> | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ? 0.306 | ms/op | >> | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ? 0.247 | ms/op | >> | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ? 4.869 ... > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > address CR comments Looks like you deleted some of the tests you had there. Can you explain why? This was not by any chance the one that failed? https://github.com/openjdk/jdk/pull/21270/commits/3c56f98d88433a4fada2c7e43147fc2e91df5e89 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21270#issuecomment-2829463601 From jbhateja at openjdk.org Fri Apr 25 06:28:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 06:28:34 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v6] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fix windows build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/862217b7..f413e0e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From enikitin at openjdk.org Fri Apr 25 07:22:56 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 25 Apr 2025 07:22:56 GMT Subject: Integrated: 8355387: [jittester] Disable downcasts by default In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:23:11 GMT, Evgeny Nikitin wrote: > Currently, JITTester's love to downcast often produces something like this: > > ArrayList someVar = (TreeSet)(Object)(List)(new ArrayList()); > > ... which is possible because it goes up to Object and then starts downcasting to some totally unrelated class / type. > > Considering the JITTester's love to casts (they are more-or-less 'safe' expressions), it means a high probability (30-50%) of a gentest to fail compilation. Even worse is the situation for ByteCode tests - that they're faulty is only recognized during the run phase. > > I suggest to disable the downcasts for now. > Testing: 50-100 generated tests in different combinations (default, with the flag set to 'false' or 'true') with artificially increased chance to casts. This pull request has now been integrated. Changeset: b41e0b17 Author: Evgeny Nikitin Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/b41e0b17490b203b19787a0d0742318fc0d03b33 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8355387: [jittester] Disable downcasts by default Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24840 From duke at openjdk.org Fri Apr 25 07:24:15 2025 From: duke at openjdk.org (erifan) Date: Fri, 25 Apr 2025 07:24:15 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 ``` With option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 ``` Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 13076858.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 ``` The small amount of performance degradation is due to test fluctuations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/1b9c3b36..34eae981 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=01-02 Stats: 36244 lines in 1018 files changed: 28133 ins; 5853 del; 2258 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From duke at openjdk.org Fri Apr 25 07:31:59 2025 From: duke at openjdk.org (erifan) Date: Fri, 25 Apr 2025 07:31:59 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: <46g4wcnZe1Hiodlu9pe4VOoE6hzpKz5tousDFKzs8qA=.edca6b56-d299-41de-a714-4f5ad5bdaa6d@github.com> References: <46g4wcnZe1Hiodlu9pe4VOoE6hzpKz5tousDFKzs8qA=.edca6b56-d299-41de-a714-4f5ad5bdaa6d@github.com> Message-ID: On Thu, 24 Apr 2025 10:28:15 GMT, Andrew Haley wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 ... > > src/hotspot/share/opto/node.cpp line 1226: > >> 1224: // be added to the IGVN worklist, then the optimization will not be applied. >> 1225: // Therefore, add this node into IGVN worklist to make the optimization happen. >> 1226: return true; > > Suggestion: > > } else if (n->Opcode() == Op_XorV || n->Opcode() == Op_XorVMask) { > // Condition for removing an unnecessary not() following > // a compare(...) operation. > // The predecessor of n (this XorV or XorVMask) may also be used > // by a useless VectorBox node which will later be eliminated by > // RemoveUseless. Return true to ensure that subgraph > // transformations are performed on n. > return true; Done. Thanks for your review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059706930 From duke at openjdk.org Fri Apr 25 07:32:01 2025 From: duke at openjdk.org (erifan) Date: Fri, 25 Apr 2025 07:32:01 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 09:39:58 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2265: >> >>> 2263: vmcmp = new VectorMaskCastNode(phase->transform(vmcmp), vmcast_vt); >>> 2264: } >>> 2265: return vmcmp; >> >> It would be preferable if you could kindly re-factor the code such that we only call VectorNode::Ideal once at return to comply with aesthetics of other idealization routines. > > Ok, I'll change it in the next commit. Done, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059702378 From duke at openjdk.org Fri Apr 25 07:42:15 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 25 Apr 2025 07:42:15 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v3] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:27:50 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. > - 8322174: RISC-V: C2 VectorizedHashCode RVV Version . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2829618098 From azafari at openjdk.org Fri Apr 25 08:03:03 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 25 Apr 2025 08:03:03 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. not now, bot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2829661946 From hgreule at openjdk.org Fri Apr 25 08:49:30 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 25 Apr 2025 08:49:30 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v5] In-Reply-To: References: Message-ID: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: move test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24382/files - new: https://git.openjdk.org/jdk/pull/24382/files/278a2a7c..f0ec0e4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From hgreule at openjdk.org Fri Apr 25 08:49:30 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 25 Apr 2025 08:49:30 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:22:42 GMT, Emanuel Peter wrote: >> Looks good. > > @iwanowww I see you did some internal testing, but not for what version. Should we re-run testing? Sorry for the delay, I moved the test now @eme64. I think @iwanowww ran the tests after I changed the implementation to his suggestion. Please let me know when you think we can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2829768857 From hgreule at openjdk.org Fri Apr 25 08:55:30 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 25 Apr 2025 08:55:30 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v6] In-Reply-To: References: Message-ID: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: correct driver path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24382/files - new: https://git.openjdk.org/jdk/pull/24382/files/f0ec0e4a..3a94bbe6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From jbhateja at openjdk.org Fri Apr 25 09:20:05 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 09:20:05 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: <7IGU54ppPKoebXOt0BaS9r-eYf92mtILbHLu8RsfmSk=.a0b3259c-e5ea-43a2-b34c-38d439bd0a41@github.com> On Thu, 24 Apr 2025 09:37:07 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2243: >> >>> 2241: in1 = in1->in(1); >>> 2242: } >>> 2243: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || >> >> Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks for a specific graph palette and replaces it with a new node whose inputs are the same as the inputs of the palette. GVN will do the retention job if any intermediate node has users beyond the pattern being replaced. > > Thanks for telling me this information. Another more important reason to check outcnt here is to prevent this optimization when the uses of VectorMaskCmp is greater than 1, because this optimization may not be worthwhile. For example: > > > public static void testVectorMaskCmp() { > IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0); > IntVector av = IntVector.fromArray(I_SPECIES, ia, 0); > VectorMask m1 = av.compare(VectorOperators.NE, bv); // two uses > VectorMask m2 =m1.not(); > m1.intoArray(m, 0); > av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0); > } > > > If we do not check outcnt and still do this optimization, two VectorMaskCmp nodes will be generated, and finally two VectorMaskCmp instructions will be generated. This is unreasonable because VectorMaskCmp has much higher latency than xor instruction on aarch64. Thanks, we can add this comment to the code where we are checking outcnt. What if all the other users are also XorNodes?. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059874975 From jbhateja at openjdk.org Fri Apr 25 09:44:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 09:44:04 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v13] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:03:09 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 563: >> >>> 561: debug_name = debug_name_oop->const_oop()->as_instance()->java_lang_String_str(buf, buflen); >>> 562: } >>> 563: Node* vcall = make_runtime_call(RC_VECTOR, >> >> By generating an upfront CallLeafVectorNode, we may miss out on performing any GVN-style optimization for trigonometric identities like the following. do you think creating a macro node which can lazily be expanded to call node during macro expansion will help. >> >> arcsin(sin(x)) => x >> arccos(cos(x)) => x >> sin(arcsin(x) => x >> cos(arccos(x) => x > > It does look attractive, but macro expansion-based solution requires JVM to internalize such operations and their properties. > > IMO a higher-level solution based on more generic JVM primitives would enable libraries to properly annotate their operations in Java bytecodes/class files, so C2 can perform such type of transformations without the need to intrinsify each individual operation first. (Think of [JDK-8218414](https://bugs.openjdk.org/browse/JDK-8218414) / [JDK-8347901](https://bugs.openjdk.org/browse/JDK-8347901) on steroids.) I agree, this is a typical graph transform which cannot be applied currently because we are generating CallLeafVectorNode upfront during parsing, If we prevent intrinsification then compiler will attempt inlining, generating a much complex graph shape which may not be reducible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2059911738 From shade at openjdk.org Fri Apr 25 09:47:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:47:01 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 00:16:25 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 93: > >> 91: >> 92: inline Method* UnloadableMethodHandle::method() const { >> 93: assert(!is_unloaded(), "Should not be unloaded"); > > Assert that `block_unloading()` was called before? Cannot do, since lifecycle allows accessing `method()` shortly after initialization. See the new lifecycle docs. `CompilerBroker` does it now, checking that `block_unloading()` was called here would fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059916538 From shade at openjdk.org Fri Apr 25 09:49:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:49:53 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: <60zLnWRtRgOKEPcmmdcAnh_QCqf-kEajreRzMMMwee4=.cd312ca6-3aaa-44cf-b731-6adcbeca5833@github.com> On Thu, 24 Apr 2025 00:22:47 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 37: > >> 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { >> 36: _method = method; >> 37: if (method != nullptr) { > > Is it possible to require `method` (and hence `_method`) to always be non-null? Yes, we can. This is a remnant of the implementation that accepted `_hot_method == nullptr`. Not needed now, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059922071 From jbhateja at openjdk.org Fri Apr 25 09:53:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 09:53:53 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 08:57:14 GMT, erifan wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 ... > > src/hotspot/share/opto/vectornode.cpp line 2234: > >> 2232: // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast to in1. >> 2233: if (in1->Opcode() != Op_VectorMaskCmp && in1->Opcode() != Op_VectorMaskCast) { >> 2234: swap(in1, in2); > > The edges are not swapped, but two variables in1 and in2 My bad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059881507 From jbhateja at openjdk.org Fri Apr 25 09:53:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 09:53:54 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 09:46:24 GMT, erifan wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38: >> >>> 36: * @summary test combining vector not operation with compare >>> 37: * @modules jdk.incubator.vector >>> 38: * @requires ((os.arch!="x86" & os.arch!="i386" & os.arch!="amd64" & os.arch!="x86_64") | vm.cpu.features ~= ".*avx.*") >> >> You can remove this platform limitation and forward the constraints to @IR rules using applyIfCPUFeatureOr > > Since this is a platform independent optimization, I tend to use this `@requires` because it's simpler. If we use `applyIfCPUFeatureOr`, we need to add the same restriction before each test. In addition, if a new architecture supports the vector node, this test may not cover it. What do you think? I don't see XorVMask implemented on all non-x86 target, like PPC etc.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059923958 From shade at openjdk.org Fri Apr 25 09:56:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:56:47 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v7] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - Inline guard - Merge branch 'master' into JDK-8231269-compile-task-weaks - Allow UMH::_method access from VMStructs - Fix VMStructs - Purge extra fluff - Touchups - Renames - ... and 5 more: https://git.openjdk.org/jdk/compare/b41e0b17...07a3cae4 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=06 Stats: 292 lines in 11 files changed: 243 ins; 25 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Fri Apr 25 09:56:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:56:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> On Thu, 24 Apr 2025 00:36:18 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.hpp line 43: > >> 41: // 3. Final released state. Relevant Method* is in unknown state, and cannot be >> 42: // accessed. >> 43: // > > Please, elaborate what state transitions are supported. Currently, my understanding is there are 3 transitions and 4 states: > * 1 -> 2 > * 2 -> 3 (terminal) > * 1 -> 3 (terminal) > * 0 (empty, terminal) I added class-level docs for this handle, see if it reads well? > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 57: > >> 55: >> 56: // Null holder, the relevant class would not be unloaded. >> 57: return nullptr; > > Is this the case of bootstrap classloader? As an optimization opportunity, it can be extended for other system loaders. Yes, this is about null (bootstrap) classloader; the system returns `nullptr` in this case. I don't think `UMH` gets to decide whether `!nullptr` holder is always alive or not, and it is safer to hold on to it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059930415 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059929919 From jbhateja at openjdk.org Fri Apr 25 09:58:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 09:58:57 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v7] In-Reply-To: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: > Adding following IR transforms for unsigned vector Min / Max nodes. > > => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) > => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) > => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) > => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) > => UMaxV (a, a) => a > => UMinV (a, a) => a > > New IR validation test accompanies the patch. > > This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21604/files - new: https://git.openjdk.org/jdk/pull/21604/files/6a2cb635..b184324b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21604/head:pull/21604 PR: https://git.openjdk.org/jdk/pull/21604 From duke at openjdk.org Fri Apr 25 10:52:53 2025 From: duke at openjdk.org (duke) Date: Fri, 25 Apr 2025 10:52:53 GMT Subject: RFR: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction [v4] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 02:34:19 GMT, Anjian-Wen wrote: >> support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 >> add C2 match rule >> add related Tests in IRNode structure >> >> passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into Vector_And-Not_vx > - split the patch into cleanformat and enable zvbb > - add prefix for test String > - modify format > > change some '_imm' to '_vi' > - modify v0.t to $v0 > - add format fix > - RISC-V: Support Zvbb Vector And-Not vx and add its tests > > Support Zvbb Vector And-Not vx and add its tests > modify name of the match rule and test name @Anjian-Wen Your change (at version a99d32ac703ea6788059b8be9c132801edf05a53) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24709#issuecomment-2830076803 From dnsimon at openjdk.org Fri Apr 25 12:06:58 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 25 Apr 2025 12:06:58 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 12:15:52 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - swap matadata and jvmci data in outputs according to data layout > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c src/hotspot/share/code/nmethod.cpp line 1505: > 1503: CHECKED_CAST(_oops_size, uint16_t, align_up(code_buffer->total_oop_size(), oopSize)); > 1504: uint16_t metadata_size = (uint16_t)align_up(code_buffer->total_metadata_size(), wordSize); > 1505: JVMCI_ONLY(CHECKED_CAST(_jvmci_data_size, uint16_t, align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize))); This cast is lossy in that `jvmci_data->size()` returns an int. It caused a `double free or corruption (out)` crash in Graal in the case where a `JVMCINMethodData` had a very long name. We've fixed this by [limiting the length of the name](https://github.com/openjdk/jdk/pull/24753) but I'm wondering if there was some special reason for this cast? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r2060115430 From jbhateja at openjdk.org Fri Apr 25 12:09:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 12:09:50 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v8] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 00:31:08 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new constant value micro-benchmarks are included alongside a new micro-benchmark to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over _baseline1_. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios compared to _baseline1_. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they slightly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. >> >> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | >> | :-------------------: | :-----------------: | :----------------: | :-------------------------: | >> | [-1, 1] | 22671 | 22190 | -2.12 | >> | [-2, 2] | 22680 | 22191 | -2.16 | >> | [-10, 10] | 22683 | 22149 | -2.35 ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Switch to constant double fields with separate micro-benchmarks Over all the patch looks good to me now apart from concerns around benchmark, existing Java implementation handles special cases upfront, thereby compromising the performance of most common cases. Java implementation scores above intrinsic in two outlier ranges < 2^-55 and > 22. While intrinsic implementation is performant for a meaty generic range ie. > 2^-55 and < 22.0 We get around 30% performance uplift from intrinsic implementation over java implementation for the bulky generic input range. For ranges above 22.0, we now see better performance in comparison to the earlier intrinsic implementation. New benchmark show clear gain for the value range [A][B][C] this patch optimizes. Baseline: ========= Benchmark (tanhRangeIndex) Mode Cnt Score Error Units TanhPerf.TanhPerfConstant.tanhConstDouble1 N/A thrpt 2 117588.175 ops/ms TanhPerf.TanhPerfConstant.tanhConstDouble21 N/A thrpt 2 117550.954 ops/ms TanhPerf.TanhPerfConstant.tanhConstDoubleLarge N/A thrpt 2 117580.385 ops/ms => A TanhPerf.TanhPerfConstant.tanhConstDoubleSmall N/A thrpt 2 403652.485 ops/ms TanhPerf.TanhPerfConstant.tanhConstDoubleTiny N/A thrpt 2 408909.294 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 0 thrpt 2 397200.032 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 1 thrpt 2 116082.297 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 2 thrpt 2 112213.540 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 3 thrpt 2 433899.459 ops/ms => B TanhPerf.TanhPerfRanges.tanhPosDoubleRange 0 thrpt 2 396818.181 ops/ms TanhPerf.TanhPerfRanges.tanhPosDoubleRange 1 thrpt 2 115886.117 ops/ms TanhPerf.TanhPerfRanges.tanhPosDoubleRange 2 thrpt 2 112048.023 ops/ms TanhPerf.TanhPerfRanges.tanhPosDoubleRange 3 thrpt 2 440250.930 ops/ms => C WithOpt: ======== Benchmark (tanhRangeIndex) Mode Cnt Score Error Units TanhPerf.TanhPerfConstant.tanhConstDouble1 N/A thrpt 2 116459.753 ops/ms TanhPerf.TanhPerfConstant.tanhConstDouble21 N/A thrpt 2 116454.242 ops/ms TanhPerf.TanhPerfConstant.tanhConstDoubleLarge N/A thrpt 2 521156.905 ops/ms => A TanhPerf.TanhPerfConstant.tanhConstDoubleSmall N/A thrpt 2 400262.455 ops/ms TanhPerf.TanhPerfConstant.tanhConstDoubleTiny N/A thrpt 2 400339.293 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 0 thrpt 2 389451.159 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 1 thrpt 2 115750.146 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 2 thrpt 2 112043.952 ops/ms TanhPerf.TanhPerfRanges.tanhNegRangeDouble 3 thrpt 2 481931.138 ops/ms => B TanhPerf.TanhPerfRanges.tanhPosDoubleRange 0 thrpt 2 390072.384 ops/ms TanhPerf.TanhPerfRanges.tanhPosDoubleRange 1 thrpt 2 115738.869 ops/ms TanhPerf.TanhPerfRanges.tanhPosDoubleRange 2 thrpt 2 111868.620 ops/ms TanhPerf.TanhPerfRanges.tanhPosDoubleRange 3 thrpt 2 561509.564 ops/ms => C ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2830244448 From jbhateja at openjdk.org Fri Apr 25 12:09:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 12:09:51 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v7] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 00:47:43 GMT, Mohamed Issa wrote: >> test/micro/org/openjdk/bench/java/lang/MathBench.java line 67: >> >>> 65: public double double1 = 1.0d, double2 = 2.0d, double81 = 81.0d, doubleNegative12 = -12.0d, double4Dot1 = 4.1d, double0Dot5 = 0.5d; >>> 66: >>> 67: public static final double tanhConstInputs[] = {-2.0, -1.0, -0.5, -0.1, 0.0, 0.1, 0.5, 1.0, 2.0}; >> >> Static final arrays have mutable elements, you should declare individual static fields with different constant values. >> We can also use @Stable annotation, in that case array index must be a constant. Is better to go with few individual static final fields with constant values. > > I switched to individual static final fields with constant values. There are separate micro-benchmarks included as well. Hi @missa-prime , I still feel that due to new parameter we have significantly increased the overall benchmark time as other kernels are redundantly executed multiple times. Please find attached a reference implementation of new benchmark and lets revert adding new parameters from MathBench.java [TanhPerf.txt](https://github.com/user-attachments/files/19907952/TanhPerf.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2060120275 From duke at openjdk.org Fri Apr 25 12:52:51 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 12:52:51 GMT Subject: Integrated: 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction In-Reply-To: References: Message-ID: <4fgqRmws8mlDWJeHWwqAn4VRL5eCRgrva3lH3acEE_U=.2041a3b7-4b73-4d1b-969e-538a9c929e77@github.com> On Thu, 17 Apr 2025 03:01:42 GMT, Anjian-Wen wrote: > support zvbb vand-not vector-scalar version, which Op1 is the sign-extended or truncated value in scalar register rs1 > add C2 match rule > add related Tests in IRNode structure > > passed jtreg test test/hotspot/jtreg/compiler/vectorapi/* This pull request has now been integrated. Changeset: 5c067232 Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/5c067232bf21aaca2b7addd2a862e15a8696ffb8 Stats: 147 lines in 4 files changed: 147 ins; 0 del; 0 mod 8355074: RISC-V: C2: Support Vector-Scalar version of Zvbb Vector And-Not instruction Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/24709 From duke at openjdk.org Fri Apr 25 13:01:14 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 25 Apr 2025 13:01:14 GMT Subject: RFR: 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad [v2] In-Reply-To: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> References: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> Message-ID: > RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8355562 - RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24865/files - new: https://git.openjdk.org/jdk/pull/24865/files/b0ed2ea8..7c77276e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24865&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24865&range=00-01 Stats: 776 lines in 32 files changed: 565 ins; 169 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/24865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24865/head:pull/24865 PR: https://git.openjdk.org/jdk/pull/24865 From rcastanedalo at openjdk.org Fri Apr 25 13:08:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Apr 2025 13:08:52 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 24 Apr 2025 09:33:54 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Emanuel Peter src/hotspot/share/opto/escape.cpp line 4879: > 4877: const TypePtr* at = mem->adr_type(); > 4878: uint alias_idx = (uint) _compile->get_alias_index(at->is_ptr()); > 4879: if (idx == i) { Could you rename this and all other occurrences of `idx` below to make the changeset buildable again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2060206726 From duke at openjdk.org Fri Apr 25 13:10:09 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 25 Apr 2025 13:10:09 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v3] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:27:50 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. > - 8322174: RISC-V: C2 VectorizedHashCode RVV Version JFTR: the numbers after the above merge on real RVV-1.0 hardware (BPI-F3 16g box) are below: Legend: UseVHI ==> UseVectorizedHashCodeIntrinsic ------------------------------------------------------------------------------------ (baseline) (patch) ------------------------------------------------------------------------------------ |-XX:-UseVHI -XX:+UseRVV|-XX:+UseVHI -XX:+UseRVV] ------------------------------------------------------------------------------------ Benchmark (size) Mode Cnt | Score Error | Score Error |Units| ------------------------------------------------------------------------------------ bytes 1 avgt 10 11.281 ? 0.005 11.279 ? 0.001 ns/op bytes 10 avgt 10 35.096 ? 0.027 35.730 ? 0.032 ns/op bytes 100 avgt 10 246.627 ? 0.144 132.879 ? 0.150 ns/op bytes 1000 avgt 10 2368.472 ? 2.174 914.207 ? 0.931 ns/op bytes 10000 avgt 10 23548.070 ? 3.285 8707.273 ? 5.666 ns/op bytes 100000 avgt 10 236725.770 ? 591.357 86590.456 ? 173.544 ns/op chars 1 avgt 10 11.282 ? 0.006 11.281 ? 0.005 ns/op chars 10 avgt 10 35.726 ? 0.013 36.978 ? 0.015 ns/op chars 100 avgt 10 246.913 ? 0.152 134.704 ? 0.036 ns/op chars 1000 avgt 10 2370.329 ? 10.804 935.244 ? 0.385 ns/op chars 10000 avgt 10 23596.177 ? 19.305 9495.412 ? 6.368 ns/op chars 100000 avgt 10 384796.824 ? 3353.051 155220.554 ? 1753.764 ns/op ints 1 avgt 10 11.280 ? 0.002 11.280 ? 0.002 ns/op ints 10 avgt 10 35.774 ? 0.220 36.357 ? 0.014 ns/op ints 100 avgt 10 246.935 ? 0.112 126.494 ? 0.159 ns/op ints 1000 avgt 10 2372.602 ? 0.753 818.846 ? 0.253 ns/op ints 10000 avgt 10 25309.538 ? 106.280 8942.238 ? 65.537 ns/op ints 100000 avgt 10 409074.598 ? 4049.489 87796.390 ? 545.247 ns/op multibytes 1 avgt 10 5.137 ? 0.006 5.138 ? 0.003 ns/op multibytes 10 avgt 10 18.361 ? 0.022 19.618 ? 0.006 ns/op multibytes 100 avgt 10 132.814 ? 0.543 96.236 ? 0.236 ns/op multibytes 1000 avgt 10 2160.723 ? 22.749 596.236 ? 1.166 ns/op multibytes 10000 avgt 10 22195.062 ? 300.592 5749.928 ? 5.748 ns/op multibytes 100000 avgt 10 205825.738 ? 1340.919 47757.644 ? 80.729 ns/op multichars 1 avgt 10 4.995 ? 0.003 5.003 ? 0.002 ns/op multichars 10 avgt 10 18.512 ? 0.015 19.430 ? 0.011 ns/op multichars 100 avgt 10 230.563 ? 1.320 101.515 ? 0.353 ns/op multichars 1000 avgt 10 1396.042 ? 16.038 634.955 ? 0.824 ns/op multichars 10000 avgt 10 13445.146 ? 8.403 5838.638 ? 9.851 ns/op multichars 100000 avgt 10 127475.457 ? 81.919 50308.336 ? 26.640 ns/op multiints 1 avgt 10 6.980 ? 2.561 5.017 ? 0.007 ns/op multiints 10 avgt 10 29.743 ? 6.021 19.479 ? 0.008 ns/op multiints 100 avgt 10 149.720 ? 0.516 110.728 ? 0.280 ns/op multiints 1000 avgt 10 1442.903 ? 30.199 1023.673 ? 16.614 ns/op multiints 10000 avgt 10 22702.792 ? 286.336 5941.205 ? 30.079 ns/op multiints 100000 avgt 10 127134.718 ? 117.502 48745.440 ? 69.036 ns/op multishorts 1 avgt 10 5.145 ? 0.009 5.140 ? 0.004 ns/op multishorts 10 avgt 10 18.506 ? 0.006 19.419 ? 0.006 ns/op multishorts 100 avgt 10 232.937 ? 2.433 100.298 ? 0.318 ns/op multishorts 1000 avgt 10 1388.111 ? 16.740 657.008 ? 4.362 ns/op multishorts 10000 avgt 10 13458.090 ? 10.975 5860.546 ? 8.239 ns/op multishorts 100000 avgt 10 127463.240 ? 102.736 50534.548 ? 34.661 ns/op shorts 1 avgt 10 11.280 ? 0.007 11.280 ? 0.004 ns/op shorts 10 avgt 10 35.721 ? 0.011 62.661 ? 0.533 ns/op shorts 100 avgt 10 246.942 ? 0.165 135.960 ? 0.029 ns/op shorts 1000 avgt 10 2368.908 ? 0.955 935.607 ? 0.678 ns/op shorts 10000 avgt 10 23608.074 ? 22.901 8913.395 ? 5.318 ns/op shorts 100000 avgt 10 237055.625 ? 532.713 94719.177 ? 352.058 ns/op ------------------------------------------------------------------------------------ ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2830382940 From eastigeevich at openjdk.org Fri Apr 25 14:33:55 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 25 Apr 2025 14:33:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:25:11 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix branch range check src/hotspot/share/ci/ciEnv.cpp line 1143: > 1141: VMThread::execute(&relocate); > 1142: } > 1143: } I don't think we should have this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060362320 From jsikstro at openjdk.org Fri Apr 25 14:51:06 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 25 Apr 2025 14:51:06 GMT Subject: RFR: 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp Message-ID: Hi, Working on a patch close to this area and saw that the ifdef didn't match the "#endif" just below. The ifdef should be COMPILER1 instead of COMPILER2. ------------- Commit messages: - 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp Changes: https://git.openjdk.org/jdk/pull/24876/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24876&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355616 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24876.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24876/head:pull/24876 PR: https://git.openjdk.org/jdk/pull/24876 From shade at openjdk.org Fri Apr 25 14:56:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 14:56:52 GMT Subject: RFR: 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 14:45:48 GMT, Joel Sikstr?m wrote: > Hi, > > Working on a patch close to this area and saw that the ifdef didn't match the "#endif" just below. The ifdef should be COMPILER1 instead of COMPILER2. Looks reasonable, thanks! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24876#pullrequestreview-2794416533 From dhanalla at openjdk.org Fri Apr 25 15:06:48 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Fri, 25 Apr 2025 15:06:48 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:21:34 GMT, Dhamoder Nalla wrote: >> This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations. >> >> >> **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.** >> >> **1. Initial State (Before Transformation)** >> The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node. >> >> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) >> >> **2. After Splitting Through Child Phi** >> The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input. >> >> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) >> >> **3. After Splitting Load Field Through Parent Phi** >> The field load operation (Load) is pushed even further up in the graph. >> >> Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging. >> >> This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement. >> >> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) >> >> ### JMH Benchmark Results: >> >> #### With Disabled RAM >> >> | Benchmark | Mode | Count | Score | Error | Units | >> |-----------|------|-------|-------|-------|-------| >> | testBailOut_runner | avgt | 15 | 13.969 | ? 0.248 | ms/op | >> | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ? 4.306 | ms/op | >> | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ? 1.781 | ms/op | >> | testMultiParentPhi_runner | avgt | 15 | 2.983 | ? 0.001 | ms/op | >> | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ? 0.731 | ms/op | >> | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ? 0.443 | ms/op | >> | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ? 1.212 | ms/op | >> | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ? 1.478 | ms/op | >> | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ? 0.306 | ms/op | >> | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ? 0.247 | ms/op | >> | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ? 4.869 ... > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > address CR comments > Looks like you deleted some of the tests you had there. Can you explain why? This was not by any chance the one that failed? [3c56f98](https://github.com/openjdk/jdk/commit/3c56f98d88433a4fada2c7e43147fc2e91df5e89) Thanks @eme64 for taking a look at it, I haven't deleted any tests; test cases list below is unchanged. try { Asserts.assertEQ(testRematerialize_SingleObj_Interp(cond1, x, y), testRematerialize_SingleObj_C2(cond1, x, y)); } catch (Exception e) {} Asserts.assertEQ(testRematerialize_TryCatch_Interp(cond1, l, x, y), testRematerialize_TryCatch_C2(cond1, l, x, y)); Asserts.assertEQ(testMerge_TryCatchFinally_Interp(cond1, l, x, y), testMerge_TryCatchFinally_C2(cond1, l, x, y)); Asserts.assertEQ(testRematerialize_MultiObj_Interp(cond1, cond2, x, y), testRematerialize_MultiObj_C2(cond1, cond2, x, y)); Asserts.assertEQ(testGlobalEscapeInThread_Intrep(cond1, l, x, y), testGlobalEscapeInThread_C2(cond1, l, x, y)); Asserts.assertEQ(testGlobalEscapeInThreadWithSync_Intrep(cond1, x, y), testGlobalEscapeInThreadWithSync_C2(cond1, x, y)); Asserts.assertEQ(testFieldEscapeWithMerge_Intrep(cond1, x, y), testFieldEscapeWithMerge_C2(cond1, x, y)); Asserts.assertEQ(testNestedPhi_FieldLoad_Interp(cond1, cond2, x, y), testNestedPhi_FieldLoad_C2(cond1, cond2, x, y)); Asserts.assertEQ(testThreeLevelNestedPhi_Interp(cond1, cond2, x, y), testThreeLevelNestedPhi_C2(cond1, cond2, x, y)); Asserts.assertEQ(testNestedPhiProcessOrder_Interp(cond1, cond2, x, y), testNestedPhiProcessOrder_C2(cond1, cond2, x, y)); Asserts.assertEQ(testNestedPhi_TryCatch_Interp(cond1, cond2, x, y), testNestedPhi_TryCatch_C2(cond1, cond2, x, y)); Asserts.assertEQ(testBailOut_Interp(cond1, cond2, x, y), testBailOut_C2(cond1, cond2, x, y)); Asserts.assertEQ(testNestedPhiPolymorphic_Interp(cond1, cond2, x, y), testNestedPhiPolymorphic_C2(cond1, cond2, x, y)); Asserts.assertEQ(testNestedPhiWithTrap_Interp(cond1, cond2, x, y), testNestedPhiWithTrap_C2(cond1, cond2, x, y)); Asserts.assertEQ(testNestedPhiWithLambda_Interp(cond1, cond2, x, y), testNestedPhiWithLambda_C2(cond1, cond2, x, y)); Asserts.assertEQ(testMultiParentPhi_Interp(cond1, x, y), testMultiParentPhi_C2(cond1, x, y)); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21270#issuecomment-2830685272 From stuefe at openjdk.org Fri Apr 25 15:25:45 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 25 Apr 2025 15:25:45 GMT Subject: RFR: 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 14:45:48 GMT, Joel Sikstr?m wrote: > Hi, > > Working on a patch close to this area and saw that the ifdef didn't match the "#endif" just below. The ifdef should be COMPILER1 instead of COMPILER2. +1 thank you ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24876#pullrequestreview-2794501682 From eastigeevich at openjdk.org Fri Apr 25 15:57:51 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 25 Apr 2025 15:57:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: <-Y_yLYd2I_ZJk0kcEF8ZqyVKtI0OiXT0WFtvHLhUWJU=.be99c2c6-e27d-448d-910d-ee5dfc97e12d@github.com> References: <-Y_yLYd2I_ZJk0kcEF8ZqyVKtI0OiXT0WFtvHLhUWJU=.be99c2c6-e27d-448d-910d-ee5dfc97e12d@github.com> Message-ID: On Fri, 25 Apr 2025 00:12:25 GMT, Chad Rakoczy wrote: >> src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 86: >> >>> 84: if (!Assembler::reachable_from_branch_at(addr(), x, true)) { >>> 85: address trampoline = call->get_trampoline(); >>> 86: assert(trampoline != nullptr, "branch is too large with no available trampoline"); >> >> Doesn't this mean any nmethod relocation can cause the JVM to crash if there is no trampoline? Or are you creating new trampoline stubs in the relocated destination as needed (and deleting trampoline stubs that are no longer needed)? > > From what I saw calls are _usually_ accompanied by a corresponding trampoline. I'll look deeper to see if _usually_ is _always_ In nmethods `bl` calls always come with trampolines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060498951 From eastigeevich at openjdk.org Fri Apr 25 16:42:06 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 25 Apr 2025 16:42:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: <-Y_yLYd2I_ZJk0kcEF8ZqyVKtI0OiXT0WFtvHLhUWJU=.be99c2c6-e27d-448d-910d-ee5dfc97e12d@github.com> References: <-Y_yLYd2I_ZJk0kcEF8ZqyVKtI0OiXT0WFtvHLhUWJU=.be99c2c6-e27d-448d-910d-ee5dfc97e12d@github.com> Message-ID: On Fri, 25 Apr 2025 00:12:25 GMT, Chad Rakoczy wrote: >> src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 86: >> >>> 84: if (!Assembler::reachable_from_branch_at(addr(), x, true)) { >>> 85: address trampoline = call->get_trampoline(); >>> 86: assert(trampoline != nullptr, "branch is too large with no available trampoline"); >> >> Doesn't this mean any nmethod relocation can cause the JVM to crash if there is no trampoline? Or are you creating new trampoline stubs in the relocated destination as needed (and deleting trampoline stubs that are no longer needed)? > > From what I saw calls are _usually_ accompanied by a corresponding trampoline. I'll look deeper to see if _usually_ is _always_ @chadrako, what issue are you trying to fix with the code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060559902 From jbhateja at openjdk.org Fri Apr 25 17:27:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Apr 2025 17:27:53 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v15] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 05:26:35 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove UseVectorStubs usage in riscv.ad Looks good to me. Best Regards ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2794825283 From duke at openjdk.org Fri Apr 25 18:10:23 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 25 Apr 2025 18:10:23 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: <-Y_yLYd2I_ZJk0kcEF8ZqyVKtI0OiXT0WFtvHLhUWJU=.be99c2c6-e27d-448d-910d-ee5dfc97e12d@github.com> Message-ID: On Fri, 25 Apr 2025 16:38:55 GMT, Evgeny Astigeevich wrote: >> From what I saw calls are _usually_ accompanied by a corresponding trampoline. I'll look deeper to see if _usually_ is _always_ > > @chadrako, what issue are you trying to fix with the code? After relocation it is possible that the call can no longer reach the destination without calling the trampoline ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060674909 From dlunden at openjdk.org Fri Apr 25 18:14:48 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:14:48 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v18] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Updates after Emanuel's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/b7aa0351..2df00458 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=16-17 Stats: 486 lines in 12 files changed: 31 ins; 5 del; 450 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Fri Apr 25 18:14:53 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:14:53 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 17:06:04 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > src/hotspot/share/opto/chaitin.cpp line 1614: > >> 1612: uint size = lrg->mask().Size(); >> 1613: ResourceMark r(C->regmask_arena()); >> 1614: RegMask rm(lrg->mask(), C->regmask_arena()); > > Absolute nit: above you already have a `rm`. There it refers to a `ResourceMark`, here to a `RegMask`. Feels a little confusing. A more expressive name could help ;) Yes, thanks. It is unfortunate that both `RegMask` and `ResourceMask` variables are usually named `rm`. I updated it here and also in other places now. > src/hotspot/share/opto/compile.hpp line 524: > >> 522: PhaseRegAlloc* _regalloc; // Results of register allocation. >> 523: RegMask _FIRST_STACK_mask; // All stack slots usable for spills (depends on frame layout) >> 524: ResourceArea _regmask_arena; // Holds dynamically allocated extensions of short-lived register masks > > If they are short-lived ... then why not just resource allocate them? Is there a conflict? Would it be good to describe that somewhere? I assume you mean resource allocate in the default `Thread::current()->resource_area()`? The existing resource marks are often far away, leading to unnecessary memory consumption. Trying to narrow the existing resource marks does lead to conflicts. I added a motivation to the description in `compile.hpp`! > src/hotspot/share/opto/matcher.cpp line 1340: > >> 1338: // Empty them all. >> 1339: for (uint i = 0; i < cnt; i++) { >> 1340: ::new (&(msfpt->_in_rms[i])) RegMask(C->comp_arena()); > > Here you do array indexing with `&`. Above you did `base + i`. Just an observation, I leave it to you if you want to do something about that. Thanks, updated for consistency. > src/hotspot/share/opto/regmask.hpp line 143: > >> 141: // constructors. If we get copied/cloned, &_RM_UP_EXT will no longer equal >> 142: // orig_ext_adr. >> 143: uintptr_t** orig_ext_adr = &_RM_UP_EXT; > > There should also be an underscore for this field, for consistency, right? Yes, fixed ? > src/hotspot/share/opto/regmask.hpp line 147: > >> 145: // If the original version, of which we may be a clone, is read-only. In such >> 146: // cases, we can allow read-only sharing. >> 147: bool orig_const = false; > > Could we have a more descriptive name? I was wondering at a use site what this is about... and it was not directly clear. Oh, and should it not be `_orig_const`, with the extra underscore at the very least? > > Maybe `_read_only_sharing`? Not sure about it, just an idea. But the underscore I'm more sure about. Sure, I now changed it to simply `_read_only`. > src/hotspot/share/opto/regmask.hpp line 182: > >> 180: // r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 ... >> 181: // [0 0 0 0 |0 1 1 0 |0 0 1 0 ] [1 1 0 1 |0 0 0 0] as as as >> 182: // [0] [1] [2] [0] [1] > > I kinda missed these two lines at first... it is not directly clear what they are for. > The first one is the mask, aaah and it is extended with the `as` values. > No idea yet what the second line is about... > > Maybe some annotation would help? For me it would ok if that means the lines become a little longer. Sure, now added "Content" and "Index" annotations. > src/hotspot/share/opto/regmask.hpp line 219: > >> 217: >> 218: // Access word i in the register mask. >> 219: const uintptr_t& _rm_up(unsigned int i) const { > > You have another set of underscore methods here... did you mean to make them private? I quickly checked, and did not find any underscore methods in the file before your change. > > This looks a little like python style, where you cannot make a method private, and so the convention is to make them underscore :) Thanks, fixed! I was not aware of the convention. The methods are already private. > src/hotspot/share/opto/regmask.hpp line 420: > >> 418: : RegMask(arena) { >> 419: Insert(reg); >> 420: DEBUG_ONLY(this->orig_const = orig_const;) > > Could this not be part of the initializer list? Or does the `Insert` prevent that? I moved things around a bit to make sure it can be put in the initializer list. I couldn't before because the initializer list contains a delegating constructor. > src/hotspot/share/opto/regmask.hpp line 429: > >> 427: } >> 428: >> 429: RegMask(const RegMask& rm) : RegMask(rm, nullptr) {} > > This is the copy constructor, right? Can you add a comment what kind of implementation you chose here, and why? Yes, it is the copy constructor. Can you elaborate a bit on what type of comment you expect? There are already some comments in the `copy` method. > test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 34: > >> 32: * -XX:+UnlockDiagnosticVMOptions >> 33: * -XX:+AbortVMOnCompilationFailure >> 34: * compiler.arguments.TestMaxMethodArguments > > Could there be a benefit to having a run without some of the flags here, and therefore also without the requires? Yes, good suggestion. Added! > test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 37: > >> 35: * -XX:+UnlockDiagnosticVMOptions >> 36: * -XX:+AbortVMOnCompilationFailure >> 37: * compiler.locks.TestNestedSynchronize > > How about a run with fewer flags? Yes, thanks ? > test/jdk/java/lang/invoke/TestCatchExceptionWithVarargs.java line 32: > >> 30: * timeouts due to compilation of a large number of methods with a >> 31: * large number of parameters. >> 32: * @run main/othervm -XX:MaxNodeLimit=15000 TestCatchExceptionWithVarargs > > Why not have two runs here. One that requires that there is no `Xcomp`, where we have the normal node limit. And one where you lower the node limit, so that `Xcomp` is ok? Same here, I just added the `MaxNodeLimit`. I'd prefer leaving any other changes to a separate RFE (if needed). > test/jdk/java/lang/invoke/VarargsArrayTest.java line 46: > >> 44: * -DVarargsArrayTest.MAX_ARITY=255 >> 45: * -DVarargsArrayTest.START_ARITY=250 >> 46: * VarargsArrayTest > > Same here. > > But I mean are those timeouts ok with Xcomp? Are we sure that these timeouts are only a test issue and not a product issue? There is a thread in the PR about this already (difficult to find!), so I'm pasting it below for convenience. > @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2023195229) Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? > > @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2023204041) Same question for the other java/lang/invoke test changes. > > @[dlunde](https://github.com/dlunde) dlunde [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2028661113) Yes, correct. No longer bailing out on too many arguments results in a lot more compilations (with -Xcomp) compared to before in these specific tests, which is why I've had to limit the tests with MaxNodeLimits. > > That said, I did look into these tests a bit more now after your comment, and there are some peculiar (but artificial) compilations that we no longer bail out on and that we may want to investigate in a future RFE. These compilations each take around 40 seconds (in a release build), are very close to the MaxNodeLimit (80 000 nodes), and spend 99% of the time in the register allocator (in the first round of conservative coalescing, specifically). I analyzed these register allocator runs and it looks like we run into the quadratic time complexity of graph-coloring register allocation, because we have a very large number of nodes to begin with and then the interference graph is additionally very dense (contains a very large number of interferences/edges). We already have bailouts related to node count in the register allocator, but no bailouts for the interference graph size. Perhaps we should consider adding this as part of a separate RFE. > > @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2028671476) > > Perhaps we should consider adding this as part of a separate RFE. > > This sounds like a good idea, I agree to postpone it to a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060669571 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060670292 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060673178 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060675000 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060671473 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060675790 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060677513 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060672618 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060678381 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060667754 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060667653 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060667453 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060667370 From dlunden at openjdk.org Fri Apr 25 18:14:55 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:14:55 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> References: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> Message-ID: <_oBt35lcOuEjkT2Q1z_gDxG739kPVKJhXztAJAn3Ysw=.87a04c5b-5a1f-4042-a0cd-1447f31827ce@github.com> On Wed, 23 Apr 2025 06:39:48 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/node.cpp line 558: >> >>> 556: MachProjNode* mach = n->as_MachProj(); >>> 557: MachProjNode* mthis = this->as_MachProj(); >>> 558: new (&mach->_rout) RegMask(mthis->_rout); >> >> Hmm. Do you have some defense against these kinds of shallow copies? Or are there cases where shallow copies of a RegMask are somehow valid? You could for example `nullptr` our the pointers in the copy constructor. Just an idea, not sure if that even works. Or you could do a proper copy in the copy constructor. Maybe you have already thought about both of those options, and neither are good... > > Ah, I see you have a `uintptr_t** orig_ext_adr = &_RM_UP_EXT;` trick. So shallow copies should not ever get used, right? Yes, I had a lot of headache related to this because register masks often get cloned in unpredictable ways (without the use of a constructor). Correct, the `orig_ext_adr` (now `original_ext_address`) enables us to trigger asserts whenever we try to mutate a cloned register mask. We can allow the use of shallow copies if both the original and copy are immutable (only happens in one case at the moment, for `BoxLockNode` register masks). This is the assert `assert(_read_only || _original_ext_address == &_RM_UP_EXT, "clone sanity check");` >> src/hotspot/share/opto/regmask.cpp line 395: >> >>> 393: >>> 394: #ifndef PRODUCT >>> 395: bool RegMask::_dump_end_run(outputStream* st, OptoReg::Name start, >> >> In my understanding we only use underscore for fields, and not methods? > > I think just making it private is sufficient, no? Thanks, removed the underscore (and it is already private). >> src/hotspot/share/opto/regmask.hpp line 199: >> >>> 197: // >>> 198: // The only operation that may update the _offset attribute is >>> 199: // RegMask::rollover(). This operation requires the register mask to be clean >> >> What does `clean` mean? All zero? Or all ones? > > I think all ones. You could write `... mask to be clean (all ones) and ...` It actually means all zeroes (i.e., empty). The rollover operation is for a very specific use case in the register allocator where we have exhausted every possible register/stack location in an all-stack register mask, and "roll it over" to expose a new set of stack locations to the register allocator (this is the now moved `chunk` logic in `Select` that I mentioned in an earlier comment). Updated now! Note that there is also a comment next to the definition of the `rollover` method. >> src/hotspot/share/opto/regmask.hpp line 448: >> >>> 446: } >>> 447: >>> 448: // Empty mask check. Ignores registers included through the all-stack flag. >> >> Suggestion: >> >> // Empty mask check. Ignores registers included through the all_stack flag. >> >> For "greppability" > > Why is that ok to ignore the `all_stack` registers? Would that be expected/clear at the use-site? Thanks, I've changed all occurrences of `all-stack` to `all_stack`. > Why is that ok to ignore the all_stack registers? Would that be expected/clear at the use-site? That's how it is expected to work. I had the same thought as you when working on this and changed it to not ignore all stack registers, but that broke a lot of things. It is not obvious why it works this way at all call sites. For some call sites, it is probably fine to also consider the all-stack flag (but not for all call sites). >> test/jdk/java/lang/invoke/BigArityTest.java line 38: >> >>> 36: * -XX:CompileCommand=memlimit,*.*,0 >>> 37: * -esa -DBigArityTest.ITERATION_COUNT=1 >>> 38: * test.java.lang.invoke.BigArityTest >> >> Would it make sense to also have a run with fewer flags? > > Or why do we need to set all these flags? If we need some of them, then a comment could be helpful. Note that I only added the `-XX:MaxNodeLimit=20000`, all the other flags are from before (I just added line breaks). It could very well make sense to have a run with fewer flags, but I'm not sure if that's compatible with what the original author intended. I'd prefer leaving it as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060668670 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060674662 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060676342 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060682057 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060667493 From dlunden at openjdk.org Fri Apr 25 18:14:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:14:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: <4UprmugxFZfWfKirSA4aJNwf2FenfjfergB-FNExod8=.89841ab7-2e07-4a6b-9f4e-3b069560be93@github.com> References: <7UllX79lzodFBQgK59E_ywZWH-WVnQegcAsRbYjaYJQ=.a33bb997-1aab-4e01-bea7-982e831aba8a@github.com> <4UprmugxFZfWfKirSA4aJNwf2FenfjfergB-FNExod8=.89841ab7-2e07-4a6b-9f4e-3b069560be93@github.com> Message-ID: On Wed, 23 Apr 2025 06:31:59 GMT, Emanuel Peter wrote: >> It is indeed arbitrary (and should be very generous for all practical cases). We need a limit so that we can compute an upper bound for register mask sizes. I've updated the comment now, does it make more sense? > > Nice! > Optional: make it upper case to emphasize that it is a constant at the use site. > Suggestion: > > const int BoxLockNode_SLOT_LIMIT = 200; Thanks, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060665553 From dlunden at openjdk.org Fri Apr 25 18:14:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:14:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> Message-ID: On Wed, 23 Apr 2025 06:48:31 GMT, Emanuel Peter wrote: >> Is it an abbreviation for something? > > Ooooh, it stands for `_all_stack == as`. Would be nice if that was stated explicitly! I agree, now explicit! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060675198 From dlunden at openjdk.org Fri Apr 25 18:26:42 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:26:42 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v19] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/2df00458..d9e4219d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Fri Apr 25 18:26:46 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:26:46 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:00:05 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > src/hotspot/share/opto/regmask.hpp line 452: > >> 450: assert(valid_watermarks(), "sanity"); >> 451: for (unsigned i = _lwm; i <= _hwm; i++) { >> 452: if (_rm_up(i)) { > > Is this an implicit `nullptr` check? If so, you need to make it explicit, the style guide forbids implicit null checks. This is a bit tricky. It is indeed a pointer type (`uintptr_t`), but it is not used as a pointer. It is used to store the register mask bits. So I guess this is then not an implicit null check? It is an implicit zero check :slightly_smiling_face: > src/hotspot/share/opto/regmask.hpp line 464: > >> 462: for (unsigned i = _lwm; i <= _hwm; i++) { >> 463: uintptr_t bits = _rm_up(i); >> 464: if (bits) { > > This also looks like an implicit null check. Let's continue the discussion in the first comment (resolving this). Depending on what we decide, I'll fix all the (potential) implicit null checks. > src/hotspot/share/opto/regmask.hpp line 473: > >> 471: >> 472: // Get highest-numbered register from mask, or BAD if mask is empty. Ignores >> 473: // registers included through the all-stack flag. > > Suggestion: > > // registers included through the all_stack flag. > > Greppability Yes, all fixed > src/hotspot/share/opto/regmask.hpp line 480: > >> 478: while (i > _lwm) { >> 479: uintptr_t bits = _rm_up(--i); >> 480: if (bits) { > > Implicit null check? Same here, resolving. > src/hotspot/share/opto/regmask.hpp line 512: > >> 510: tmp |= _rm_up(i); >> 511: } >> 512: return !tmp && is_AllStack(); > > Implicit null checks? > > Could you not check `is_AllStack` early and return already? Thanks, I added an early exit for `is_AllStack` and also an early exit in the loop. > src/hotspot/share/opto/regmask.hpp line 619: > >> 617: _hwm = _rm_max(); >> 618: _set_range(0, 0xFF, _rm_size); >> 619: set_AllStack(true); > > Suggestion: > > set_AllStack(); > > You already have a default `true` value, right? Check for other occurances. Thanks, I just removed the default value and made it explicit everywhere. > src/hotspot/share/opto/regmask.hpp line 747: > >> 745: >> 746: // Subtract 'rm' from 'this', but ignore everything in 'rm' that does not >> 747: // overlap with us and to not modify our all-stack flag. Supports masks of > > A little confused about the grammar in `and to not modify our all-stack flag`... Thanks, typo. Fixed "to" -> "do" > test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 43: > >> 41: try { >> 42: test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 21 7, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255); >> 43: } catch (Exception e) { > > That makes me a little nervous. You catch all exceptions. What if we throw some unexpected null pointer exception? Is that fine too? > > Suggestion: define your own exception, and only catch that one. Thanks, fixed! > test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 90: > >> 88: // The above is a massive program. Therefore, we do not directly inline the >> 89: // program in TestNestedSynchronize and instead compile and run it via the >> 90: // CompileFramework. > > Nice, I like it. Of course I have no bias here ;) Great :slightly_smiling_face: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060686693 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060688125 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060688715 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060689145 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060690375 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060692592 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060695524 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060697403 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060698080 From dlunden at openjdk.org Fri Apr 25 18:26:48 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:26:48 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:06:21 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/regmask.hpp line 551: >> >>> 549: static int num_registers(uint ireg, LRG &lrg); >>> 550: >>> 551: // Overlap test. Non-zero if any registers in common, including all-stack. >> >> Suggestion: >> >> // Overlap test. Non-zero if any registers in common, including all_stack. >> >> greppability > > There are more, I won't tag them all now ;) All fixed ? >> src/hotspot/share/opto/regmask.hpp line 561: >> >>> 559: unsigned lwm = MAX2(_lwm, rm._lwm); >>> 560: for (unsigned i = lwm; i <= hwm; i++) { >>> 561: if (_rm_up(i) & rm._rm_up(i)) { >> >> Implicit null check? > > More cases below, won't tag them all now. Same here, resolving. >> src/hotspot/share/opto/regmask.hpp line 589: >> >>> 587: } >>> 588: } >>> 589: } >> >> There is a bit of code duplication here. A helper method could help, no pun intended. > > Boah, but it is also not horrible to leave as is. Maybe a helper method does not make more readable. Not sure. I see your point, but I'm not sure how much it helps readability. I suggest leaving it as is. >> src/hotspot/share/opto/regmask.hpp line 799: >> >>> 797: >>> 798: public: >>> 799: unsigned int static basic_rm_size() { return _RM_SIZE; } >> >> What makes it `basic`? Elsewhere, you call `RM_SIZE` the "base size". I don't understand this part, so I'm just asking if you think it is consistent? > > The name divergence between `basic_rm_size` and `_RM_SIZE` generally makes me a little suspicious if we chose the names right? Why do we even need the `RM` / `rm` prefix everyhwere? Is that not already clear from the context, we are after all in a register mask ? Not sure if it is worth changing everything now, or ever. But we should at least look for consistency ;) Yes, it is confusing but consistent. Your intuition is correct: there is a difference between `_rm_size` (the current total size, including extension) and `_RM_SIZE` (the base static size) ?. @robcasloz introduced the "basic" terminology when working on tests in `test_regmask.cpp` and needed some way to expose `_RM_SIZE` publically in non-product code. Therefore, we have the method `basic_rm_size`. I don't really have a better suggestion. Perhaps `base_rm_size`, or `static_rm_size`? As in "the base/static part of rm_size". We cannot call the method `_RM_SIZE()` as that is prohibited by the style guide. We cannot call the method `RM_SIZE()` as `RM_SIZE` is a macro (and also not the same thing as `_RM_SIZE` on 64-bit machines). > Why do we even need the RM / rm prefix everyhwere? We really don't, but that's how it is :slightly_smiling_face: Could be worth refactoring, but not in this changeset! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060690749 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060691115 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060691968 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060696987 From dlunden at openjdk.org Fri Apr 25 18:30:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 25 Apr 2025 18:30:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:39:28 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor and improve TestNestedSynchronize.java > > test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 107: > >> 105: acc.addFirst(String.format("public class %s {", test_class_name)); >> 106: acc.addLast("}"); >> 107: return String.join("\n", acc); > > Nit: What prevented you from generating it with a `ArrayList`, and just only append at the end? > > In fact, that would allow you to use a StringBuilder directly. > > And you could format the `synchronized` string only once. That might be even more efficient. > > You can also leave it, this is really a nit :) Nothing prevented it, it just felt more natural doing it in this way. I guess a habit from doing a lot of functional programming :slightly_smiling_face: It should have the same time complexity now as using a `StringBuilder` (linear), so I'd prefer leaving it as is unless you strongly object! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2060705367 From eastigeevich at openjdk.org Fri Apr 25 19:13:04 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 25 Apr 2025 19:13:04 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 23:25:11 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix branch range check src/hotspot/share/code/relocInfo.cpp line 379: > 377: } else { > 378: // Reassert the callee address, this time in the new copy of the code. > 379: pd_set_call_destination(callee); if (src->contains(callee)) { // ... int offset = pointer_delta_as_int(callee, orig_addr); callee = addr() + offset; } pd_set_call_destination(callee); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060761363 From sviswanathan at openjdk.org Fri Apr 25 19:51:50 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Apr 2025 19:51:50 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v7] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Fri, 25 Apr 2025 09:58:57 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating comments Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21604#pullrequestreview-2795146502 From sparasa at openjdk.org Fri Apr 25 20:04:10 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 25 Apr 2025 20:04:10 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: cleanup ecmov, eorw and other refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/2768cc52..6a01e747 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=12-13 Stats: 117 lines in 3 files changed: 25 ins; 61 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From iveresov at openjdk.org Fri Apr 25 20:32:28 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 25 Apr 2025 20:32:28 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling Message-ID: Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 ------------- Commit messages: - Merge branch 'master' into pp - Add AOTCompileEagerly flag to control compilation after clinit - Port 8355334: [leyden] Missing type profile info in archived training data - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation - Use ENABLE_IF macro - Missing part of the last commit - Fix value of CompLevel_count - Add test - Don't compile trainingData.cpp without CDS (part 2) - Don't compile trainingData.cpp without CDS (part 1) - ... and 18 more: https://git.openjdk.org/jdk/compare/5c067232...3ec132e7 Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355003 Stats: 3202 lines in 57 files changed: 2976 ins; 103 del; 123 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From duke at openjdk.org Fri Apr 25 20:50:35 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 25 Apr 2025 20:50:35 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v14] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Move post init and remove no entrant check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/1c6db6c6..027f5245 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=12-13 Stats: 8 lines in 1 file changed: 2 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Fri Apr 25 20:56:51 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 25 Apr 2025 20:56:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 14:43:00 GMT, Erik ?sterlund wrote: >> I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. >> It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head. >> 1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up. >> 2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up. >> 3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values. >> 4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it? >> 5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed >> 6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. >> 7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive... > >> Hi @fisk, >> >> Thank you for the very valuable comment. It has point we have not thought about. >> >> > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. >> >> It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache. > > I mean nmethods with a subtly different life cycle where usual invariants/expectations don't hold. Like method handle intrinsics and enter special intrinsics for example. Used to have a different life cycle for OSR nmethods too. > >> > You can't just copy oops. >> >> Yes, this is the main issue at the moment. Can we do this at a safepoint? > > I don't think it solves much. You can't stash away a pointer to the nmethod, roll to a safepoint, and expect the nmethod to not be freed. Even if you did, you still can't copy the oops. > > If we are to do this, I think you want to apply nmethod entry barriers first. That stabilizes the oops. > >> > I'm worried about copying the nmethod epoch counters >> >> We should clear them. If not, it is a bug. > > I'd like to change copying from opt-out to opt-in instead; that would make me feel more comfortable. Then perhaps you can share initialization code that sets up the initial state of the nmethod exactly in the same way as normal nmethods. > > I didn't check but you need to take the Compile_lock and verify dependencies too if you didn't do that, I think, so you don't race with deoptimization. > >> > You don't check if the nmethod is_unloading() when cloning it. >> >> Should such nmethods be not entrant? We don't relocate not entrant nmethods. > > is_not_entrant doesn't imply is_unloading. > >> > What are the consequences of copying the deoptimization generation? >> >> What do you mean? > > I mean is it safe to racingly copy the deoptmization generation when there is concurrent deoptimization? This is why I'd prefer copying to be opt-in rather than opt-out so we don't have to stare at every single field and wonder what will happen when a new nmethod "inherits" state from a different nmethod in interesting races. I want it to work as much as possible as normal nmethod installation, starting with a state as close as possible to when the original nmethod was created, as opp... @fisk Thank you for the valuable feedback. Here is a more detailed response to the concerns you brought up > 1 You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up. Instead of tracking the nmethod pointer which could become stale I updated the code to use method handles. I believe the method handle should ensure the method remains valid and we can then relocate its corresponding nmethod. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.cpp#L106-L110) > 2 Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up. The relocated nmethod is added as a dependent nmethod on all of the MethodHandles and InstranceKlass in its dependency scope. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1543-L1564) > 3 I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values. The source nmethod entry barrier is now called before copying. I believe this will disarm the barrier and reset the guard value for it to be safe to copy. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1530) > 4 I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it? Copying this value was not intentional. It should be correctly set to the default value now. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1441) > 5 You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed I added this check to ensure the nmethod is not unloading and removed the not entrant check as is unloading implies not entrant. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1583-L1585) > 6 Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. I?m still investigating the JVMCI speculation data and how the nmethod mirror is used. I will follow up when I have a clearer understanding. > 7 By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive a safepoint. For example, if a GC safepoint runs first, the GC might decide to unload the nmethod. It then traverses all known pointers to stale nmethods, and cleans them up so that nobody is referring to the nmethod any longer. Naturally, the GC won't know that there is a stale _nm pointer embedded into your VM operation. When you start messing around with it you enter a use-after-free situation and we will blow up. a) Due to the nature of oops it seems a safe point is necessary. I do not see a fix to the latency problem. b) For the stale nmethod pointer issues I updated to use methodHandle. Same reasoning as number 1 > 8 What are the consequences of copying the deoptimization generation? I don't know! This was unintentional and the value is no longer copied. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1440) > 9 Sometimes the method() is null when using Truffle. I added null check before updating nmethod reference to help avoid this. As mentioned earlier I do not have much knowledge around Truffle/JVMCI so I will follow up on this when I have a better understanding. > 10 Since you don't hold the Compile_lock across the safepoint, it's not obvious to me that you can't get a not_installed nmethod. Can you? I don't know what the consequences are of cloning one of those. The target nmethod will start off as not_installed, but I don't know that it will be made in_use. I updated the code to hold the Compile_lock to ensure we do not relocate nmethods during construction. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.hpp#L93-L98) > 11 These new special nmethods call post_init after installing the nmethod in the Method, while normally the order is reversed. While this may or may not be okay, it introduces a new anomaly where new special nmethods are being special I moved the post_init call to be more inline with the other constructors. Creating a ?special? nmethod was not the intention and I agree the relocation should follow the normal creation where possible. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1522) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2831412968 From vlivanov at openjdk.org Fri Apr 25 21:20:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:20:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Fri, 25 Apr 2025 09:52:43 GMT, Aleksey Shipilev wrote: > I don't think UMH gets to decide whether !nullptr holder is always alive or not, and it is safer to hold on to it. I looked around and stumbled upon the following code in `ClassLoaderData` [1]. I haven't checked myself, but it looks like a hidden class injected into bootstrap loader has `klass_holder == nullptr` while still is amenable to GC... IMO a check for `method->method_holder()->class_loader_data()->is_permanent_class_loader_data()` would do a better job serving the immediate needs and communicating the intentions. [1] bool ClassLoaderData::is_permanent_class_loader_data() const { return is_builtin_class_loader_data() && !has_class_mirror_holder(); } // Returns true if the class loader for this class loader data is one of // the 3 builtin (boot application/system or platform) class loaders, // including a user-defined system class loader. Note that if the class // loader data is for a non-strong hidden class then it may // get freed by a GC even if its class loader is one of these loaders. bool ClassLoaderData::is_builtin_class_loader_data() const { ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2060892632 From vlivanov at openjdk.org Fri Apr 25 21:22:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:22:52 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v15] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 05:26:35 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove UseVectorStubs usage in riscv.ad Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2831450883 From vlivanov at openjdk.org Fri Apr 25 21:26:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:26:54 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v13] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 09:41:21 GMT, Jatin Bhateja wrote: >> It does look attractive, but macro expansion-based solution requires JVM to internalize such operations and their properties. >> >> IMO a higher-level solution based on more generic JVM primitives would enable libraries to properly annotate their operations in Java bytecodes/class files, so C2 can perform such type of transformations without the need to intrinsify each individual operation first. (Think of [JDK-8218414](https://bugs.openjdk.org/browse/JDK-8218414) / [JDK-8347901](https://bugs.openjdk.org/browse/JDK-8347901) on steroids.) > > I agree, this is a typical graph transform which cannot be applied currently because we are generating CallLeafVectorNode upfront during parsing, If we prevent intrinsification then compiler will attempt inlining, generating a much complex graph shape which may not be reducible. I don't see any insurmountable problems performing such transformations on chains of `CallLeafVector` nodes (or any other call nodes). But the missing piece is information about the algebraic properties of native functions JVM can't derive on its own. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2060898080 From vlivanov at openjdk.org Fri Apr 25 21:26:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:26:56 GMT Subject: Integrated: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! This pull request has now been integrated. Changeset: e57fd710 Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/e57fd710496b2ac8aa93eb3d4ff2234170fa2e37 Stats: 1309 lines in 50 files changed: 834 ins; 389 del; 86 mod 8353786: Migrate Vector API math library support to FFM API Reviewed-by: jbhateja, kvn, psandoz, xgong, jvernee, mli ------------- PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Fri Apr 25 21:27:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:27:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Fri, 25 Apr 2025 09:53:03 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/runtime/unloadableMethodHandle.hpp line 43: >> >>> 41: // 3. Final released state. Relevant Method* is in unknown state, and cannot be >>> 42: // accessed. >>> 43: // >> >> Please, elaborate what state transitions are supported. Currently, my understanding is there are 3 transitions and 4 states: >> * 1 -> 2 >> * 2 -> 3 (terminal) >> * 1 -> 3 (terminal) >> * 0 (empty, terminal) > > I added class-level docs for this handle, see if it reads well? Looks good. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2060899202 From duke at openjdk.org Fri Apr 25 22:21:51 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 25 Apr 2025 22:21:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: <1_OG6abFZwl9AWbxsm5eCrL6RWq1wTnPngdDky6V3f8=.3a126cd0-22c0-45bb-9f85-3f096de116d6@github.com> On Thu, 24 Apr 2025 23:49:36 GMT, Dean Long wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix branch range check > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 940: > >> 938: >> 939: static bool reachable_from_branch_at(address branch, address target, bool use_max=false) { >> 940: return uabs(target - branch) < (use_max ? max_branch_range : branch_range); > > This might be the wrong approach. Using the max range will make the assert below fail less often in debug builds, but disables the stress feature of using the shorter 2M range. I did this because during code buffer expansion `pd_set_call_destination` gets called but there is no relocation info at that time. So with debug builds it was incorrectly trying to find a trampoline stub that did not exist yet because it believed it needed to when it didn't. I agree this is probably not the best approach though and I will look for a better solution ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060956552 From eosterlund at openjdk.org Fri Apr 25 22:33:51 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 25 Apr 2025 22:33:51 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 14:43:00 GMT, Erik ?sterlund wrote: >> I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. >> It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head. >> 1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up. >> 2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up. >> 3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values. >> 4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it? >> 5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed >> 6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. >> 7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive... > >> Hi @fisk, >> >> Thank you for the very valuable comment. It has point we have not thought about. >> >> > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. >> >> It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache. > > I mean nmethods with a subtly different life cycle where usual invariants/expectations don't hold. Like method handle intrinsics and enter special intrinsics for example. Used to have a different life cycle for OSR nmethods too. > >> > You can't just copy oops. >> >> Yes, this is the main issue at the moment. Can we do this at a safepoint? > > I don't think it solves much. You can't stash away a pointer to the nmethod, roll to a safepoint, and expect the nmethod to not be freed. Even if you did, you still can't copy the oops. > > If we are to do this, I think you want to apply nmethod entry barriers first. That stabilizes the oops. > >> > I'm worried about copying the nmethod epoch counters >> >> We should clear them. If not, it is a bug. > > I'd like to change copying from opt-out to opt-in instead; that would make me feel more comfortable. Then perhaps you can share initialization code that sets up the initial state of the nmethod exactly in the same way as normal nmethods. > > I didn't check but you need to take the Compile_lock and verify dependencies too if you didn't do that, I think, so you don't race with deoptimization. > >> > You don't check if the nmethod is_unloading() when cloning it. >> >> Should such nmethods be not entrant? We don't relocate not entrant nmethods. > > is_not_entrant doesn't imply is_unloading. > >> > What are the consequences of copying the deoptimization generation? >> >> What do you mean? > > I mean is it safe to racingly copy the deoptmization generation when there is concurrent deoptimization? This is why I'd prefer copying to be opt-in rather than opt-out so we don't have to stare at every single field and wonder what will happen when a new nmethod "inherits" state from a different nmethod in interesting races. I want it to work as much as possible as normal nmethod installation, starting with a state as close as possible to when the original nmethod was created, as opp... > @fisk Thank you for the valuable feedback. Here is a more detailed response to the concerns you brought up Thanks, it's shaping up. > Instead of tracking the nmethod pointer which could become stale I updated the code to use method handles. I believe the method handle should ensure the method remains valid and we can then relocate its corresponding nmethod. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.cpp#L106-L110) The safepoint is still causing more trouble than it solves. It was introduced due to oop phobia. What the oops really needed to stabilize is to run the entry barrier which you do now. The safepoint merely destabilizes the oops again while introducing latency problems and fun class redefinition interactions. It should be removed as I can't see it serves any purpose. > The relocated nmethod is added as a dependent nmethod on all of the MethodHandles and InstranceKlass in its dependency scope. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1543-L1564) My concern was about something else - a table tracks all the nmethods that have old metadata in order to speed up a walk over the code cache that finds said nmethods. This should be dealt with by not relocating nmethods with evol dependencies/metadata and by not safepointing, which could introduce class redefinition which populates this table. > The source nmethod entry barrier is now called before copying. I believe this will disarm the barrier and reset the guard value for it to be safe to copy. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1530) Yes and fix the oops so you don't need a safepoint. > Copying this value was not intentional. It should be correctly set to the default value now. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1441) Good. > I added this check to ensure the nmethod is not unloading and removed the not entrant check as is unloading implies not entrant. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1583-L1585) That's not quite true. There are two separate mechanisms that guard the entry. When sn nmethod becomes invalid due to for example a broken speculative assumption, the verified entry is patched with a jump to a trampoline. This is what is_not_entrant refers to and nmethods are made not entrant one by one. is_unloaded() is a separate mechanism using the nmethod entry barriers which can be bulk armed instead, and uses a conditional branch to guard the entry. This is used by the GC to guard the entry. You should only relocate nmethods that are is_in_use(). > I?m still investigating the JVMCI speculation data and how the nmethod mirror is used. I will follow up when I have a clearer understanding. Sounds good. > a) Due to the nature of oops it seems a safe point is necessary. I do not see a fix to the latency problem. > > b) For the stale nmethod pointer issues I updated to use methodHandle. Same reasoning as number 1 I hope I explained by now that the safepoint doesn't really help with the oops. > This was unintentional and the value is no longer copied. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1440) Good. > I added null check before updating nmethod reference to help avoid this. As mentioned earlier I do not have much knowledge around Truffle/JVMCI so I will follow up on this when I have a better understanding. Sounds good. > I updated the code to hold the Compile_lock to ensure we do not relocate nmethods during construction. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.hpp#L93-L98) Okay. Speaking of which, seems like the NMethodState_lock is held for way too long - usually just held when setting the Method code and updating the nmethod state after the initial state is set. Keeping the lock across other things makes me worried of deadlocks. > I moved the post_init call to be more inline with the other constructors. Creating a ?special? nmethod was not the intention and I agree the relocation should follow the normal creation where possible. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1522) Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2831542576 From duke at openjdk.org Fri Apr 25 22:52:53 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 25 Apr 2025 22:52:53 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: <1_OG6abFZwl9AWbxsm5eCrL6RWq1wTnPngdDky6V3f8=.3a126cd0-22c0-45bb-9f85-3f096de116d6@github.com> References: <1_OG6abFZwl9AWbxsm5eCrL6RWq1wTnPngdDky6V3f8=.3a126cd0-22c0-45bb-9f85-3f096de116d6@github.com> Message-ID: On Fri, 25 Apr 2025 22:18:08 GMT, Chad Rakoczy wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 940: >> >>> 938: >>> 939: static bool reachable_from_branch_at(address branch, address target, bool use_max=false) { >>> 940: return uabs(target - branch) < (use_max ? max_branch_range : branch_range); >> >> This might be the wrong approach. Using the max range will make the assert below fail less often in debug builds, but disables the stress feature of using the shorter 2M range. > > ~I did this because during code buffer expansion `pd_set_call_destination` gets called but there is no relocation info at that time. So with debug builds it was incorrectly trying to find a trampoline stub that did not exist yet because it believed it needed to when it didn't. I agree this is probably not the best approach though and I will look for a better solution~ Actually the issue is not during code buffer expansion. It's called when creating a new nmethod that I can only get to occur when using the Graal compiler. So it may not be true that calls always have trampolines in the case of Graal. This _fix_ may just make the bug harder to encounter ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2060974982 From duke at openjdk.org Sat Apr 26 01:06:55 2025 From: duke at openjdk.org (Mohamed Issa) Date: Sat, 26 Apr 2025 01:06:55 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new constant value micro-benchmarks are included alongside a new micro-benchmark to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over _baseline1_. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios compared to _baseline1_. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they slightly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. > > | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | > | :-------------------: | :-----------------: | :----------------: | :-------------------------: | > | [-1, 1] | 103342 | 103705 | +0.35 | > | [-2, 2] | 99977 | 100819 | +0.84 | > | [-10, 10] | 99147 | 100240 | +1.10 | > | [-20, 20] ... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Create separate tanh micro-benchmark module to avoid noise in MathBench ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/66be269e..006eef6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=07-08 Stats: 220 lines in 2 files changed: 154 ins; 65 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From kvn at openjdk.org Sat Apr 26 01:30:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 26 Apr 2025 01:30:01 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 23:05:13 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Fix message I found that C strings caching has issue - it is concurrent and needs synchronization. Also I see a lot more strings after recent changes in mainline. After investigation I filed separate RFE for mainline and I am working on it: [JDK-8355635: Do not collect C strings in C2 scratch buffer](https://bugs.openjdk.org/browse/JDK-8355635) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2831713648 From duke at openjdk.org Sat Apr 26 02:27:51 2025 From: duke at openjdk.org (duke) Date: Sat, 26 Apr 2025 02:27:51 GMT Subject: RFR: 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad [v2] In-Reply-To: References: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> Message-ID: On Fri, 25 Apr 2025 13:01:14 GMT, Anjian-Wen wrote: >> RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8355562 > - RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad @Anjian-Wen Your change (at version 7c77276e61420c0bd85692b79aa57f7488cdbb44) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24865#issuecomment-2831751155 From kvn at openjdk.org Sat Apr 26 02:44:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 26 Apr 2025 02:44:55 GMT Subject: RFR: 8355635: Do not collect C strings in C2 scratch buffer Message-ID: [JDK-8349479](https://bugs.openjdk.org/browse/JDK-8349479) added call to `code_string()` for Halt node mach node. I am observing several more creation and clearing C strings collections during C2 compilation: [17.405s][debug][codestrings] Clear 2 asm-remarks. [17.405s][debug][codestrings] Clear 1 dbg-string. Most are coming from temporary scratch buffer C2 uses for code size calculation. I suggest to not collect strings in this buffer. Note, `CodeSection::set_scratch_emit()` is only called from [PhaseOutput::scratch_emit_size()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3368) for scratch buffer. I verified with `-XX:CompileCommand=print,::` that hsdis output is the same. Running on linux-x64 with fastdebug VM: before: $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc 1644 6576 80618 after again: $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc 0 0 0 It is more dramatic with `-Xcomp` we use for testing: Before $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc 70924 283696 3533261 After fix $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc 0 0 0 I was curious why it is 0 - we do deoptimize nmethod. But with big default CodeCache GC does not collect them. Reducing CodeCache to 8Mb shows deallocation: $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug -Xlog:codecache=debug -XX:+PrintCodeCache -XX:ReservedCodeCacheSize=8M com.sun.tools.javac.Main HelloWorld.java ... [40.196s][debug][codestrings] Clear 42 asm-remarks. [40.196s][debug][codestrings] Clear 1 dbg-string. [40.196s][debug][codecache ] *flushing nmethod 5811/0x00007f71d74b7188. Live blobs:3370/Free CodeCache:2953Kb Tested tier1-5, Xcomp,comp-stress. ------------- Commit messages: - 8355635: Do not collect C strings in C2 scratch buffer Changes: https://git.openjdk.org/jdk/pull/24893/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24893&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355635 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24893/head:pull/24893 PR: https://git.openjdk.org/jdk/pull/24893 From duke at openjdk.org Sat Apr 26 03:01:50 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sat, 26 Apr 2025 03:01:50 GMT Subject: Integrated: 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad In-Reply-To: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> References: <9pPaDS5emcWnw5TSf6yPMx9oYEODHaJ8oNUB0v9SMuc=.bb4262ca-178c-4d30-be85-a7d6862252b3@github.com> Message-ID: On Fri, 25 Apr 2025 02:15:57 GMT, Anjian-Wen wrote: > RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad This pull request has now been integrated. Changeset: 91a9043f Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/91a9043f9df0e345719df3bfd0a7d0f2a96e6109 Stats: 146 lines in 1 file changed: 0 ins; 0 del; 146 mod 8355562: RISC-V: Cleanup names of vector-scalar instructions in riscv_v.ad Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/24865 From jbhateja at openjdk.org Sat Apr 26 03:33:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 26 Apr 2025 03:33:55 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Tue, 25 Feb 2025 17:49:33 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Updating copyright year of modified files >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Update IR transforms and tests >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - 8342676: Unsigned Vector Min / Max transforms > > @jatin-bhateja Just ping me here if this is ready for another review ;) Thanks @eme64 and @sviswa7 for your review and approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21604#issuecomment-2831799393 From jbhateja at openjdk.org Sat Apr 26 03:33:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 26 Apr 2025 03:33:56 GMT Subject: Integrated: 8342676: Unsigned Vector Min / Max transforms In-Reply-To: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Mon, 21 Oct 2024 11:01:04 GMT, Jatin Bhateja wrote: > Adding following IR transforms for unsigned vector Min / Max nodes. > > => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) > => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) > => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) > => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) > => UMaxV (a, a) => a > => UMinV (a, a) => a > > New IR validation test accompanies the patch. > > This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 3b3a055d Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3b3a055d7605338e93814ccfe2a4a18a7786f43f Stats: 633 lines in 5 files changed: 633 ins; 0 del; 0 mod 8342676: Unsigned Vector Min / Max transforms Reviewed-by: sviswanathan, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21604 From jbhateja at openjdk.org Sat Apr 26 03:40:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 26 Apr 2025 03:40:45 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 01:06:55 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new constant value micro-benchmarks are included alongside a new micro-benchmark to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over _baseline1_. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios compared to _baseline1_. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they slightly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. >> >> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | >> | :-------------------: | :-----------------: | :----------------: | :-------------------------: | >> | [-1, 1] | 103342 | 103705 | +0.35 | >> | [-2, 2] | 99977 | 100819 | +0.84 | >> | [-10, 10] | 99147 | 100240 | +1.10 | > ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Create separate tanh micro-benchmark module to avoid noise in MathBench Thanks @missa-prime. LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23889#pullrequestreview-2795713635 From duke at openjdk.org Sat Apr 26 04:11:49 2025 From: duke at openjdk.org (Mohamed Issa) Date: Sat, 26 Apr 2025 04:11:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v7] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 12:07:33 GMT, Jatin Bhateja wrote: >> I switched to individual static final fields with constant values. There are separate micro-benchmarks included as well. > > Hi @missa-prime , I still feel that due to new parameter we have significantly increased the overall benchmark execution time as other kernels are redundantly executed multiple times. > > Please find attached a reference implementation of new benchmark and lets revert adding new parameters from MathBench.java > [TanhPerf.txt](https://github.com/user-attachments/files/19907952/TanhPerf.txt) Thanks Jatin. The new benchmark is included. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2061149870 From kvn at openjdk.org Sat Apr 26 22:49:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 26 Apr 2025 22:49:52 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: Message-ID: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> On Fri, 25 Apr 2025 20:18:41 GMT, Igor Veresov wrote: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 In general it is okay. Please, use UL with `(aot, training)` tags for your code. I noticed (and added comment) that you use `guarantee()` which crashes VM when you call new `verify` methods. Can you disable TD instead and continue execution? src/hotspot/share/cds/archiveBuilder.cpp line 770: > 768: relocate_embedded_pointers(&_rw_src_objs); > 769: relocate_embedded_pointers(&_ro_src_objs); > 770: log_info(cds)("Relocating %zu pointers, %zu tagged, %zu nulled", `log_info(aot)` if it is Leyden related. src/hotspot/share/cds/dumpAllocStats.hpp line 151: > 149: } > 150: > 151: void record_dynamic_proxy_class() { This is not called. This code seems not related. src/hotspot/share/ci/ciInstanceKlass.hpp line 47: > 45: friend class ciField; > 46: friend class ciReplay; > 47: friend class CompileTrainingData; Not referenced here src/hotspot/share/ci/ciMethod.cpp line 1147: > 1145: // heuristic (e.g. post call nop instructions; see InlineSkippedInstructionsCounter) > 1146: int ciMethod::inline_instructions_size() { > 1147: if (_inline_instructions_size == -1) { Why repeat this condition and not put new code under existing one? src/hotspot/share/ci/ciMethodData.cpp line 71: > 69: > 70: bool is_live(Method* m) { > 71: Klass* holder = m->method_holder(); Changes in this file seems not related and can be pushed/tested separately. If they are related - there should be condition for additional checks. src/hotspot/share/ci/ciObjectFactory.hpp line 2: > 1: /* > 2: * Copyright (c) 1999, 2020, Oracle and/or its affiliates. All rights reserved. Wrong year src/hotspot/share/compiler/compileBroker.hpp line 456: > 454: }; > 455: > 456: class TrainingReplayThread : public JavaThread { Add comment explaining what this thread do exactly. src/hotspot/share/memory/allocation.cpp line 89: > 87: } > 88: > 89: // Work-around -- see JDK-8331086 We should not use bug ID in comment (here and in .hpp file). Please, explain in the comment why you do that. src/hotspot/share/oops/methodCounters.cpp line 57: > 55: > 56: MethodCounters::MethodCounters() { > 57: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); The same comment as for `MathodData()`. src/hotspot/share/oops/methodCounters.cpp line 82: > 80: > 81: void MethodCounters::metaspace_pointers_do(MetaspaceClosure* it) { > 82: log_trace(cds)("Iter(MethodCounters): %p", this); Use `log_trace(aot, training)` src/hotspot/share/oops/methodCounters.hpp line 129: > 127: void set_highest_osr_comp_level(int level) { _highest_osr_comp_level = (u1)level; } > 128: > 129: Extra empty line src/hotspot/share/oops/methodData.cpp line 1296: > 1294: > 1295: MethodData::MethodData() { > 1296: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); 1. Should its code be guarded by `#if INCLUDE_CDS`? 2. Comment where/how it is used. 3. Is it used in all phases or only during TRAINING and ASSEMBLY? 4. Can you add query methods into `CDSConfig` which you can call here and in other places?: is_dumping_training_data() is_using_training_data() src/hotspot/share/oops/methodData.cpp line 1434: > 1432: > 1433: bool MethodData::is_mature() const { > 1434: return CompilationPolicy::is_mature((MethodData*)this); Why you need the cast? src/hotspot/share/oops/methodData.cpp line 1796: > 1794: > 1795: void MethodData::metaspace_pointers_do(MetaspaceClosure* it) { > 1796: log_trace(cds)("Iter(MethodData): %p for %p %s", this, _method, _method->name_and_sig_as_C_string()); We discussed yesterday and we need to use `aot` instead `cds` for Leyden. `aot` tag is already in mainline. I suggest to use new tag `(aot, training)` to separate your output from general CDS/AOT output. src/hotspot/share/oops/trainingData.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. Use one year src/hotspot/share/oops/trainingData.cpp line 54: > 52: > 53: MethodTrainingData::MethodTrainingData() { > 54: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); Consider adding and using `CDSConfig::is_dumping_training_data() ` or something. src/hotspot/share/oops/trainingData.cpp line 76: > 74: > 75: static void verify_archived_entry(TrainingData* td, const TrainingData::Key* k) { > 76: guarantee(TrainingData::Key::can_compute_cds_hash(k), ""); Should we gracefully disable using TD instead of crashing VM? src/hotspot/share/oops/trainingData.cpp line 545: > 543: if (is_excluded) { > 544: ResourceMark rm; > 545: log_debug(cds)("Cleanup KTD %s", name()->as_klass_external_name()); `log_debug(aot, training)` src/hotspot/share/oops/trainingData.hpp line 2: > 1: /* > 2: * Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved. This is debatable but suggestion to use only one year (2025) for new files. src/hotspot/share/oops/trainingData.hpp line 756: > 754: int highest_level() const { return highest_level(_level_mask); } > 755: int highest_top_level() const { return _highest_top_level; } > 756: MethodData* final_profile() const { return _final_profile; } `never_inlined()`, `highest_level()` are not used. src/hotspot/share/runtime/init.cpp line 189: > 187: #endif > 188: > 189: if (TrainingData::have_data() || TrainingData::need_data()) { Why 2 checks? Comment please. test/hotspot/jtreg/runtime/cds/appcds/aotProfile/AOTProfileFlags.java line 30: > 28: * @requires vm.cds > 29: * @comment work around JDK-8345635 > 30: * @requires !vm.jvmci.enabled Consider adding: * @requires vm.cds.supports.aot.class.linking * @requires vm.flagless ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2796434540 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061680543 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061639984 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061668601 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061637565 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061635488 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061668094 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061631348 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061629193 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061626756 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061626460 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061626301 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061626021 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061625757 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061625222 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061687468 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061690598 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061694717 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061697540 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061627798 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061664186 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061612509 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061611330 From jrose at openjdk.org Sat Apr 26 22:58:44 2025 From: jrose at openjdk.org (John R Rose) Date: Sat, 26 Apr 2025 22:58:44 GMT Subject: RFR: 8355635: Do not collect C strings in C2 scratch buffer In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 02:39:58 GMT, Vladimir Kozlov wrote: > [JDK-8349479](https://bugs.openjdk.org/browse/JDK-8349479) added call to `code_string()` for Halt node mach node. > I am observing several more creation and clearing C strings collections during C2 compilation: > [17.405s][debug][codestrings] Clear 2 asm-remarks. > [17.405s][debug][codestrings] Clear 1 dbg-string. > > Most are coming from temporary scratch buffer C2 uses for code size calculation. I suggest to not collect strings in this buffer. > > Note, `CodeSection::set_scratch_emit()` is only called from [PhaseOutput::scratch_emit_size()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3368) for scratch buffer. > > I verified with `-XX:CompileCommand=print,::` that hsdis output is the same. > > Running on linux-x64 with fastdebug VM: > > before: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 1644 6576 80618 > > after again: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > It is more dramatic with `-Xcomp` we use for testing: > > Before > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 70924 283696 3533261 > > After fix > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > I was curious why it is 0 - we do deoptimize nmethod. But with big default CodeCache GC does not collect them. > Reducing CodeCache to 8Mb shows deallocation: > > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug -Xlog:codecache=debug -XX:+PrintCodeCache -XX:ReservedCodeCacheSize=8M com.sun.tools.javac.Main HelloWorld.java > ... > [40.196s][debug][codestrings] Clear 42 asm-remarks. > [40.196s][debug][codestrings] Clear 1 dbg-string. > [40.196s][debug][codecache ] *flushing nmethod 5811/0x00007f71d74b7188. Live blobs:3370/Free CodeCache:2953Kb > > > Tested tier1-5, Xcomp,comp-stress. Yeah, the zero was puzzling to me too; looked like evidence that we killed all the strings, not just the ones in the scratch assembly. But your extra demonstration shows that we collect the strings we want for the non-scratch assemblies. Good change. ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24893#pullrequestreview-2796562739 From iveresov at openjdk.org Sat Apr 26 23:38:46 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sat, 26 Apr 2025 23:38:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: <__a0S14OZpuwaQX6jFAlJhZJy2fZ9flb_1LW9gUD_sI=.b2b7851a-0aed-421b-9755-89fc63e404da@github.com> On Sat, 26 Apr 2025 21:07:04 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/runtime/init.cpp line 189: > >> 187: #endif >> 188: >> 189: if (TrainingData::have_data() || TrainingData::need_data()) { > > Why 2 checks? Comment please. Because it's only needed if we're recording or replaying. I'll add a comment. > test/hotspot/jtreg/runtime/cds/appcds/aotProfile/AOTProfileFlags.java line 30: > >> 28: * @requires vm.cds >> 29: * @comment work around JDK-8345635 >> 30: * @requires !vm.jvmci.enabled > > Consider adding: > > * @requires vm.cds.supports.aot.class.linking > * @requires vm.flagless Could you please explain why? @iklam, what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061836271 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061832468 From iveresov at openjdk.org Sat Apr 26 23:46:47 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sat, 26 Apr 2025 23:46:47 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 21:28:12 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/oops/methodData.cpp line 1434: > >> 1432: >> 1433: bool MethodData::is_mature() const { >> 1434: return CompilationPolicy::is_mature((MethodData*)this); > > Why you need the cast? To remove constness. I'll make it a const_cast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061843635 From iveresov at openjdk.org Sat Apr 26 23:55:49 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sat, 26 Apr 2025 23:55:49 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 22:09:24 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/ci/ciMethodData.cpp line 71: > >> 69: >> 70: bool is_live(Method* m) { >> 71: Klass* holder = m->method_holder(); > > Changes in this file seems not related and can be pushed/tested separately. If they are related - there should be condition for additional checks. You mean you want these checks to be done only if `TrainingData::have_data() == true` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061846315 From iveresov at openjdk.org Sun Apr 27 00:00:56 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 00:00:56 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 22:42:02 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/oops/trainingData.cpp line 76: > >> 74: >> 75: static void verify_archived_entry(TrainingData* td, const TrainingData::Key* k) { >> 76: guarantee(TrainingData::Key::can_compute_cds_hash(k), ""); > > Should we gracefully disable using TD instead of crashing VM? But this is a verification code. That seems to be the usual strategy, is it not? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061847152 From iveresov at openjdk.org Sun Apr 27 00:05:46 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 00:05:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 22:15:27 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/cds/dumpAllocStats.hpp line 151: > >> 149: } >> 150: >> 151: void record_dynamic_proxy_class() { > > This is not called. This code seems not related. True. @iklam, this came with a change you wanted me to take. Ok to cut this out? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061851887 From iveresov at openjdk.org Sun Apr 27 00:23:45 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 00:23:45 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 21:30:25 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/oops/methodData.cpp line 1296: > >> 1294: >> 1295: MethodData::MethodData() { >> 1296: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); > > 1. Should its code be guarded by `#if INCLUDE_CDS`? > 2. Comment where/how it is used. > 3. Is it used in all phases or only during TRAINING and ASSEMBLY? > 4. Can you add query methods into `CDSConfig` which you can call here and in other places?: > > is_dumping_training_data() > is_using_training_data() I think those are used for CDS serialization/deserialization, right, @iklam? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061856884 From iveresov at openjdk.org Sun Apr 27 00:26:45 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 00:26:45 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 22:32:00 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/ci/ciInstanceKlass.hpp line 47: > >> 45: friend class ciField; >> 46: friend class ciReplay; >> 47: friend class CompileTrainingData; > > Not referenced here It allows `CompileTrainingData` to peek into the `ciInstanceKlass` internals. We need the klass ptr specially. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061864230 From iveresov at openjdk.org Sun Apr 27 00:34:45 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 00:34:45 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 22:11:58 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/ci/ciMethod.cpp line 1147: > >> 1145: // heuristic (e.g. post call nop instructions; see InlineSkippedInstructionsCounter) >> 1146: int ciMethod::inline_instructions_size() { >> 1147: if (_inline_instructions_size == -1) { > > Why repeat this condition and not put new code under existing one? The caching logic sets the `_inline_instructions_size` if the value is found in the archive. The normal logic doesn't need to run if this happens. Something like: if (_value == -1) { _value = get_from_cache(); } if (_value == -1) { // didn't get it from the cache _value = compute_value(); } So basically to delineate the caching logic from the normal path. Does it make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061875051 From iklam at openjdk.org Sun Apr 27 00:54:47 2025 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 27 Apr 2025 00:54:47 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <__a0S14OZpuwaQX6jFAlJhZJy2fZ9flb_1LW9gUD_sI=.b2b7851a-0aed-421b-9755-89fc63e404da@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> <__a0S14OZpuwaQX6jFAlJhZJy2fZ9flb_1LW9gUD_sI=.b2b7851a-0aed-421b-9755-89fc63e404da@github.com> Message-ID: On Sat, 26 Apr 2025 23:35:01 GMT, Igor Veresov wrote: >> test/hotspot/jtreg/runtime/cds/appcds/aotProfile/AOTProfileFlags.java line 30: >> >>> 28: * @requires vm.cds >>> 29: * @comment work around JDK-8345635 >>> 30: * @requires !vm.jvmci.enabled >> >> Consider adding: >> >> * @requires vm.cds.supports.aot.class.linking >> * @requires vm.flagless > > Could you please explain why? @iklam, what do you think? I think it's OK to test without these two additional flags. This will make sure that the two diagnostic flags don't have any bad side effect even if AOT class linking is disabled (due to flags like -XX:+UseZGC). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061892882 From iklam at openjdk.org Sun Apr 27 01:06:00 2025 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 27 Apr 2025 01:06:00 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sun, 27 Apr 2025 00:02:41 GMT, Igor Veresov wrote: >> src/hotspot/share/cds/dumpAllocStats.hpp line 151: >> >>> 149: } >>> 150: >>> 151: void record_dynamic_proxy_class() { >> >> This is not called. This code seems not related. > > True. @iklam, this came with a change you wanted me to take. Ok to cut this out? Yes, this part is not needed. AOT dynamic proxy classes are only supported in the Leyden repo. >> src/hotspot/share/oops/methodData.cpp line 1296: >> >>> 1294: >>> 1295: MethodData::MethodData() { >>> 1296: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); >> >> 1. Should its code be guarded by `#if INCLUDE_CDS`? >> 2. Comment where/how it is used. >> 3. Is it used in all phases or only during TRAINING and ASSEMBLY? >> 4. Can you add query methods into `CDSConfig` which you can call here and in other places?: >> >> is_dumping_training_data() >> is_using_training_data() > > I think those are used for CDS serialization/deserialization, right, @iklam? This constructor is used by cppVtables.cpp to calculate the size of the vtables for MethodData, and also for finding the address of the vtable of MethodData. All types in `CPP_VTABLE_TYPES_DO` must have such an empty constructor. E.g., `InstanceKlass::InstanceKlass()`. We have not been very consistent with comments around these constructors, but I think we can do this: #if INCLUDE_CDS MethodData::MethodData() { // Used by cppVtables.cpp only assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); } #endif This method is called even if we are not dumping training data. The vtables of all types in `CPP_VTABLE_TYPES_DO` are unconditionally computed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061901400 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061898996 From kvn at openjdk.org Sun Apr 27 01:09:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 01:09:44 GMT Subject: RFR: 8355635: Do not collect C strings in C2 scratch buffer In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 22:56:23 GMT, John R Rose wrote: >> [JDK-8349479](https://bugs.openjdk.org/browse/JDK-8349479) added call to `code_string()` for Halt node mach node. >> I am observing several more creation and clearing C strings collections during C2 compilation: >> [17.405s][debug][codestrings] Clear 2 asm-remarks. >> [17.405s][debug][codestrings] Clear 1 dbg-string. >> >> Most are coming from temporary scratch buffer C2 uses for code size calculation. I suggest to not collect strings in this buffer. >> >> Note, `CodeSection::set_scratch_emit()` is only called from [PhaseOutput::scratch_emit_size()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3368) for scratch buffer. >> >> I verified with `-XX:CompileCommand=print,::` that hsdis output is the same. >> >> Running on linux-x64 with fastdebug VM: >> >> before: >> $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc >> 1644 6576 80618 >> >> after again: >> $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc >> 0 0 0 >> >> >> It is more dramatic with `-Xcomp` we use for testing: >> >> Before >> $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc >> 70924 283696 3533261 >> >> After fix >> $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc >> 0 0 0 >> >> >> I was curious why it is 0 - we do deoptimize nmethod. But with big default CodeCache GC does not collect them. >> Reducing CodeCache to 8Mb shows deallocation: >> >> $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug -Xlog:codecache=debug -XX:+PrintCodeCache -XX:ReservedCodeCacheSize=8M com.sun.tools.javac.Main HelloWorld.java >> ... >> [40.196s][debug][codestrings] Clear 42 asm-remarks. >> [40.196s][debug][codestrings] Clear 1 dbg-string. >> [40.196s][debug][codecache ] *flushing nmethod 5811/0x00007f71d74b7188. Live blobs:3370/Free CodeCache:2953Kb >> >> >> Tested tier1-5, Xcomp,comp-stress. > > Yeah, the zero was puzzling to me too; looked like evidence that we killed all the strings, not just the ones in the scratch assembly. But your extra demonstration shows that we collect the strings we want for the non-scratch assemblies. > > Good change. Thank you, @rose00 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24893#issuecomment-2832846623 From kvn at openjdk.org Sun Apr 27 01:14:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 01:14:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> <__a0S14OZpuwaQX6jFAlJhZJy2fZ9flb_1LW9gUD_sI=.b2b7851a-0aed-421b-9755-89fc63e404da@github.com> Message-ID: On Sun, 27 Apr 2025 00:52:37 GMT, Ioi Lam wrote: >> Could you please explain why? @iklam, what do you think? > > I think it's OK to test without these two additional flags. This will make sure that the two diagnostic flags don't have any bad side effect even if AOT class linking is disabled (due to flags like -XX:+UseZGC). I see them in `aotClassLinking/AOTClassLinkingVMOptions.java` test. @veresov please, add link to your mach5 testing to JBS. And add to description what tests you already ran. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061907156 From kvn at openjdk.org Sun Apr 27 01:19:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 01:19:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sun, 27 Apr 2025 00:24:31 GMT, Igor Veresov wrote: >> src/hotspot/share/ci/ciInstanceKlass.hpp line 47: >> >>> 45: friend class ciField; >>> 46: friend class ciReplay; >>> 47: friend class CompileTrainingData; >> >> Not referenced here > > It allows `CompileTrainingData` to peek into the `ciInstanceKlass` internals. We need the klass ptr specially. I missed that it is "friend" declaration. >> src/hotspot/share/ci/ciMethodData.cpp line 71: >> >>> 69: >>> 70: bool is_live(Method* m) { >>> 71: Klass* holder = m->method_holder(); >> >> Changes in this file seems not related and can be pushed/tested separately. If they are related - there should be condition for additional checks. > > You mean you want these checks to be done only if `TrainingData::have_data() == true` ? Yes, if it is related. Otherwise you may change default behavior when Leyden code is not used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061910254 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061908809 From kvn at openjdk.org Sun Apr 27 01:23:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 01:23:52 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 23:58:04 GMT, Igor Veresov wrote: >> src/hotspot/share/oops/trainingData.cpp line 76: >> >>> 74: >>> 75: static void verify_archived_entry(TrainingData* td, const TrainingData::Key* k) { >>> 76: guarantee(TrainingData::Key::can_compute_cds_hash(k), ""); >> >> Should we gracefully disable using TD instead of crashing VM? > > But this is a verification code. That seems to be the usual strategy, is it not? I thought if we can not use AOT cache we issue warning and continue execution without it. Unless you are checking for damaged AOT cache which may affect execution without it. @iklam what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061910742 From iveresov at openjdk.org Sun Apr 27 01:39:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 01:39:55 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sun, 27 Apr 2025 01:20:45 GMT, Vladimir Kozlov wrote: >> But this is a verification code. That seems to be the usual strategy, is it not? > > I thought if we can not use AOT cache we issue warning and continue execution without it. > Unless you are checking for damaged AOT cache which may affect execution without it. > > @iklam what do you think? But this runs only with `-XX:+VerifyTrainingData`. It's a testing mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061918596 From iveresov at openjdk.org Sun Apr 27 01:51:46 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 01:51:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: <2ks6GsBxC4Ig0uyg6ufSmzxbU4pvbYcnRWF2IuB0-ac=.954c69b1-e8f3-4e69-abd3-0147ba798471@github.com> On Sat, 26 Apr 2025 21:30:25 GMT, Vladimir Kozlov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > src/hotspot/share/oops/methodData.cpp line 1296: > >> 1294: >> 1295: MethodData::MethodData() { >> 1296: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); > > 1. Should its code be guarded by `#if INCLUDE_CDS`? > 2. Comment where/how it is used. > 3. Is it used in all phases or only during TRAINING and ASSEMBLY? > 4. Can you add query methods into `CDSConfig` which you can call here and in other places?: > > is_dumping_training_data() > is_using_training_data() @vnkozlov Are you ok with Ioi's code change proposal? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061936106 From kvn at openjdk.org Sun Apr 27 01:59:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 01:59:45 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sun, 27 Apr 2025 01:37:33 GMT, Igor Veresov wrote: >> I thought if we can not use AOT cache we issue warning and continue execution without it. >> Unless you are checking for damaged AOT cache which may affect execution without it. >> >> @iklam what do you think? > > But this runs only with `-XX:+VerifyTrainingData`. It's a testing mode. I missed that. Okay then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061939418 From kvn at openjdk.org Sun Apr 27 02:03:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 02:03:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <2ks6GsBxC4Ig0uyg6ufSmzxbU4pvbYcnRWF2IuB0-ac=.954c69b1-e8f3-4e69-abd3-0147ba798471@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> <2ks6GsBxC4Ig0uyg6ufSmzxbU4pvbYcnRWF2IuB0-ac=.954c69b1-e8f3-4e69-abd3-0147ba798471@github.com> Message-ID: <_unPFSSYWqe0ldilT96MjUkRFTkiPsYhWkLvWNPj0Ek=.1745d82b-22e3-4fbe-9b46-6d0b3aa177f0@github.com> On Sun, 27 Apr 2025 01:49:20 GMT, Igor Veresov wrote: >> src/hotspot/share/oops/methodData.cpp line 1296: >> >>> 1294: >>> 1295: MethodData::MethodData() { >>> 1296: assert(CDSConfig::is_dumping_static_archive() || UseSharedSpaces, "only for CDS"); >> >> 1. Should its code be guarded by `#if INCLUDE_CDS`? >> 2. Comment where/how it is used. >> 3. Is it used in all phases or only during TRAINING and ASSEMBLY? >> 4. Can you add query methods into `CDSConfig` which you can call here and in other places?: >> >> is_dumping_training_data() >> is_using_training_data() > > @vnkozlov Are you ok with Ioi's code change proposal? Yes, I am ok. I assume you will change accordingly all other similar empty constructors in your code: `MethodCounters()` and `*TrainingData()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061941118 From iklam at openjdk.org Sun Apr 27 02:22:56 2025 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 27 Apr 2025 02:22:56 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> <__a0S14OZpuwaQX6jFAlJhZJy2fZ9flb_1LW9gUD_sI=.b2b7851a-0aed-421b-9755-89fc63e404da@github.com> Message-ID: On Sun, 27 Apr 2025 01:11:39 GMT, Vladimir Kozlov wrote: >> I think it's OK to test without these two additional flags. This will make sure that the two diagnostic flags don't have any bad side effect even if AOT class linking is disabled (due to flags like -XX:+UseZGC). > > I see them in `aotClassLinking/AOTClassLinkingVMOptions.java` test. > @veresov please, add link to your mach5 testing to JBS. And add to description what tests you already ran. OK, let's add those two `@requires` to be consistent with other tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2061952289 From iveresov at openjdk.org Sun Apr 27 05:20:46 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 27 Apr 2025 05:20:46 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling In-Reply-To: <_unPFSSYWqe0ldilT96MjUkRFTkiPsYhWkLvWNPj0Ek=.1745d82b-22e3-4fbe-9b46-6d0b3aa177f0@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> <2ks6GsBxC4Ig0uyg6ufSmzxbU4pvbYcnRWF2IuB0-ac=.954c69b1-e8f3-4e69-abd3-0147ba798471@github.com> <_unPFSSYWqe0ldilT96MjUkRFTkiPsYhWkLvWNPj0Ek=.1745d82b-22e3-4fbe-9b46-6d0b3aa177f0@github.com> Message-ID: On Sun, 27 Apr 2025 02:00:48 GMT, Vladimir Kozlov wrote: >> @vnkozlov Are you ok with Ioi's code change proposal? > > Yes, I am ok. I assume you will change accordingly all other similar empty constructors in your code: `MethodCounters()` and `*TrainingData()`. Yes, will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2062292557 From duke at openjdk.org Sun Apr 27 07:56:24 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sun, 27 Apr 2025 07:56:24 GMT Subject: RFR: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions Message-ID: As the issue describe, some match rule and predict match not only type I, in case of the misleading, try to delete some "I" in the format and instruct name ------------- Commit messages: - RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions Changes: https://git.openjdk.org/jdk/pull/24904/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24904&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355657 Stats: 80 lines in 1 file changed: 0 ins; 0 del; 80 mod Patch: https://git.openjdk.org/jdk/pull/24904.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24904/head:pull/24904 PR: https://git.openjdk.org/jdk/pull/24904 From duke at openjdk.org Sun Apr 27 10:12:45 2025 From: duke at openjdk.org (erifan) Date: Sun, 27 Apr 2025 10:12:45 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 09:48:59 GMT, Jatin Bhateja wrote: >> Since this is a platform independent optimization, I tend to use this `@requires` because it's simpler. If we use `applyIfCPUFeatureOr`, we need to add the same restriction before each test. In addition, if a new architecture supports the vector node, this test may not cover it. What do you think? > > I don't see XorVMask implemented on all non-x86 target, like PPC etc.. This is not specifically required on x86, but because this test fails on x86 when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the sub-graph is like this: `(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))` It is not an optimization pattern supported by this patch because we don't know what's the comparison op. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2062583790 From duke at openjdk.org Sun Apr 27 11:53:31 2025 From: duke at openjdk.org (kuaiwei) Date: Sun, 27 Apr 2025 11:53:31 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Fix build error on mac and windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/333b57bb..2e205128 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=13-14 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From kvn at openjdk.org Sun Apr 27 21:52:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 21:52:43 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v8] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Downgraded UL as asked. Added synchronization to C strings caching. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/91c37dad..2eb7e80d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=06-07 Stats: 135 lines in 10 files changed: 53 ins; 18 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From kvn at openjdk.org Sun Apr 27 21:55:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 21:55:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 23:05:13 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Fix message Downgraded UL - minimum output for `log_info`. Added new lock `AOTCodeCStrings_lock` to synchronize C strings caching - several compiler threads may add C strings simultaneously. Also don't record c strings from C2 scratch buffer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2833656527 From kvn at openjdk.org Sun Apr 27 21:55:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 21:55:49 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v8] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 21:52:43 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Downgraded UL as asked. Added synchronization to C strings caching. I repeated testing again - it passed. @iwanowww, @ashu-mehra please review again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2833656967 From kvn at openjdk.org Sun Apr 27 22:22:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 27 Apr 2025 22:22:02 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v9] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into JDK-8350209 - Downgraded UL as asked. Added synchronization to C strings caching. - Fix message - Generate far jumps for AOT code on AArch64 - remove _enabled suffix - Add sanity test for AOTAdapterCaching flag - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. - Removed unused AOTCodeSection class - 8350209: Preserve adapters in AOT cache ------------- Changes: https://git.openjdk.org/jdk/pull/24740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=08 Stats: 3306 lines in 51 files changed: 2837 ins; 200 del; 269 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From gcao at openjdk.org Mon Apr 28 01:44:41 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 01:44:41 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions Message-ID: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. ### Testing qemu-system 9.1.50 with UseRVV: - [x] Run test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8355654: RISC-V: Relax register constraint for some vector-scalar instructions Changes: https://git.openjdk.org/jdk/pull/24905/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355654 Stats: 66 lines in 1 file changed: 0 ins; 0 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/24905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24905/head:pull/24905 PR: https://git.openjdk.org/jdk/pull/24905 From fyang at openjdk.org Mon Apr 28 01:44:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 01:44:54 GMT Subject: RFR: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 07:17:36 GMT, Anjian-Wen wrote: > As the issue describe, some match rule and predict match not only type I, in case of the misleading, try to delete some "I" in the format and instruct name src/hotspot/cpu/riscv/riscv_v.ad line 631: > 629: match(Set dst_src (SubVS (Binary dst_src (Replicate src2)) v0)); > 630: match(Set dst_src (SubVI (Binary dst_src (Replicate src2)) v0)); > 631: format %{ "vsub_vx_masked $dst_src, $dst_src, $src2, $v0" %} Can you update the format of `instruct vsubL_vx_masked` while you are on it? It should be `vsubL_vx_masked` instead of `vsub_vx_masked`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24904#discussion_r2062789808 From fyang at openjdk.org Mon Apr 28 03:09:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 03:09:33 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations Message-ID: Hi, please review this change. https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. This also enables some extra IR tests in file test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java. Testing: - [x] make test TEST="jdk_vector" (QEMU / fastdebug) JMH tested on BPI-F3 SBC for reference: Before: ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms After: ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.309 ops/ms IntMaxVector.UMINMasked 1024 thrpt 5 535.359 ? 15.765 ops/ms LongMaxVector.UMAX 1024 thrpt 5 367.765 ? 0.835 ops/ms LongMaxVector.UMAXMasked 1024 thrpt 5 366.470 ? 1.179 ops/ms LongMaxVector.UMIN 1024 thrpt 5 365.960 ? 2.007 ops/ms LongMaxVector.UMINMasked 1024 thrpt 5 268.859 ? 14.090 ops/ms ------------- Commit messages: - 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations Changes: https://git.openjdk.org/jdk/pull/24909/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24909&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355667 Stats: 91 lines in 3 files changed: 56 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/24909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24909/head:pull/24909 PR: https://git.openjdk.org/jdk/pull/24909 From duke at openjdk.org Mon Apr 28 03:10:30 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 28 Apr 2025 03:10:30 GMT Subject: RFR: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions [v2] In-Reply-To: References: Message-ID: > As the issue describe, some match rule and predict match not only type I, in case of the misleading, try to delete some "I" in the format and instruct name Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: update vsubL_vx_masked format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24904/files - new: https://git.openjdk.org/jdk/pull/24904/files/57934239..0885242f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24904&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24904&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24904.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24904/head:pull/24904 PR: https://git.openjdk.org/jdk/pull/24904 From gcao at openjdk.org Mon Apr 28 03:25:11 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 03:25:11 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV Message-ID: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. ### Testing qemu-system 9.1.50 with UseRVV: - [x] Run test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV Changes: https://git.openjdk.org/jdk/pull/24910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355668 Stats: 20 lines in 1 file changed: 0 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24910/head:pull/24910 PR: https://git.openjdk.org/jdk/pull/24910 From duke at openjdk.org Mon Apr 28 03:25:47 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 03:25:47 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Fri, 18 Apr 2025 08:37:17 GMT, kuaiwei wrote: >> First batch, need to change trains, I'll continue later :) > >> First batch, need to change trains, I'll continue later :) > > Thanks for your review. I will check them. > @kuaiwei Thanks for your response! > > What about these two things I brought up? > > > Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > > > I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). Test is added. If Load/Or has other usage, they can not be merged. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2833879090 From duke at openjdk.org Mon Apr 28 03:25:53 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 03:25:53 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:54:33 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Remove unused code >> - Move code to addnode.cpp and add more tests >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 > > src/hotspot/share/opto/addnode.cpp line 816: > >> 814: LoadNode* const _load; >> 815: Node* const _combine; >> 816: int const _shift; > > Suggestion: > > jint const _shift; > > I prefer using `jint` etc when we are talking about values that correlate to java types. With the C/C++ numerical types, there could always be issues on different platforms with different bit sizes. Fixed > src/hotspot/share/opto/addnode.cpp line 843: > >> 841: Node* const _combine; >> 842: MemoryAdjacentStatus _order; >> 843: bool _require_reverse_bytes; // Do we need add a ReverseBytes for merged load > > Suggestion: > > bool _require_reverse_bytes; // Do we need to add a ReverseBytes for merged load Fixed > src/hotspot/share/opto/addnode.cpp line 857: > >> 855: >> 856: private: >> 857: // Detect the embedding combine node is a candidate for merging loads > > Suggestion: > > // Detect if the embedding combine node is a candidate for merging loads Fixed > src/hotspot/share/opto/addnode.cpp line 920: > >> 918: >> 919: // Go through ConvI2L which is unique output of input node >> 920: const Node* MergePrimitiveLoads::by_pass_i2l(const Node* n) { > > Suggestion: > > const Node* MergePrimitiveLoads::bypass_ConvI2L(const Node* n) { > > I think `bypass` is a single verb, not two wordy ;) Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2062835992 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2062835815 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2062835896 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2062836374 From duke at openjdk.org Mon Apr 28 03:25:53 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 03:25:53 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: <3ZwBSGNwatEmf9zBAciFTFAMCingd1y6r6qQXnSBIw4=.fc72a7e0-a684-430c-8ee9-8c48001e0533@github.com> References: <3ZwBSGNwatEmf9zBAciFTFAMCingd1y6r6qQXnSBIw4=.fc72a7e0-a684-430c-8ee9-8c48001e0533@github.com> Message-ID: On Mon, 21 Apr 2025 11:20:38 GMT, Emanuel Peter wrote: >> I choose `GrowableArray` for convenience. And it can be changed as a stack allocate data. > > Ah, we have a misunderstanding. I was asking why not > Suggestion: > > typedef GrowableArray MergeLoadInfoList; > > i.e. allocate the elements of the array directly in the array (in-place), rather than allocating separate elements. I changed as `GrowableArray` . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2062835731 From duke at openjdk.org Mon Apr 28 03:25:55 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 03:25:55 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v9] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 11:42:35 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - Add tests >> - Fix test failure >> - Remove some debug trace >> - ... and 1 more: https://git.openjdk.org/jdk/compare/ee1577b7...e37c4bf3 > > src/hotspot/share/opto/memnode.cpp line 2396: > >> 2394: assert(last_op != nullptr && (last_op->Opcode() == Op_OrI || last_op->Opcode() == Op_OrL), "sanity"); >> 2395: _phase->is_IterGVN()->replace_node(last_op, replace); >> 2396: _phase->is_IterGVN()->_worklist.push(merged_load); > > If you did this in `OrNode::Ideal`, then you just have to return the new load, and `IGVN` takes care of the replacing. That is the code pattern we use everywhere else. The optimization has been moved to `OrNode::Ideal` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2062834641 From fyang at openjdk.org Mon Apr 28 03:50:44 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 03:50:44 GMT Subject: RFR: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 03:10:30 GMT, Anjian-Wen wrote: >> As the issue describe, some match rule and predict match not only type I, in case of the misleading, try to delete some "I" in the format and instruct name > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > update vsubL_vx_masked format Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24904#pullrequestreview-2797919372 From gcao at openjdk.org Mon Apr 28 03:50:44 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 03:50:44 GMT Subject: RFR: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 03:10:30 GMT, Anjian-Wen wrote: >> As the issue describe, some match rule and predict match not only type I, in case of the misleading, try to delete some "I" in the format and instruct name > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > update vsubL_vx_masked format LGTM, https://github.com/openjdk/jdk/pull/24910 Waiting for this PR to merge in. ------------- Marked as reviewed by gcao (Author). PR Review: https://git.openjdk.org/jdk/pull/24904#pullrequestreview-2797919804 From gcao at openjdk.org Mon Apr 28 04:15:34 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 04:15:34 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v2] In-Reply-To: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: > Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Code format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24910/files - new: https://git.openjdk.org/jdk/pull/24910/files/1d927a8d..4e40f649 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=00-01 Stats: 16 lines in 1 file changed: 4 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24910/head:pull/24910 PR: https://git.openjdk.org/jdk/pull/24910 From iveresov at openjdk.org Mon Apr 28 04:22:32 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 28 Apr 2025 04:22:32 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v2] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with two additional commits since the last revision: - Remove the workaround of setting AOTRecordTraining during assembly - Address some of the review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/3ec132e7..fea6171e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=00-01 Stats: 96 lines in 9 files changed: 30 ins; 49 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From qamai at openjdk.org Mon Apr 28 05:02:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 28 Apr 2025 05:02:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> Message-ID: <3kJkhxljNnUZ_b6c3jGsi4YA-JEDnZG79jblC_G47Jc=.bd8bf380-8150-4c2c-b4c9-a1b11884c01a@github.com> On Tue, 22 Apr 2025 15:22:04 GMT, Emanuel Peter wrote: >> @eme64 It would be great if you can come back to this > > @merykitty Oh dear, I dropped it again. Thanks for the reminder! I actually just thought about this one over the easter weekend. And it seems to me we have had lots of "bit optimizations" that could be much more powerfully solved with "known bits". So let's continue working on this. @eme64 Ping. Please don't be annoyed as I think I will ping you more frequently in case you forget. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2833983899 From thartmann at openjdk.org Mon Apr 28 05:14:45 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Apr 2025 05:14:45 GMT Subject: RFR: 8355635: Do not collect C strings in C2 scratch buffer In-Reply-To: References: Message-ID: <9B1NJ_ekz4KUsWCugAvf1PksIH46dFsj1YRR-mTdqlw=.5d5a0cb8-88ea-4ae6-b64c-d6cce2f6189d@github.com> On Sat, 26 Apr 2025 02:39:58 GMT, Vladimir Kozlov wrote: > [JDK-8349479](https://bugs.openjdk.org/browse/JDK-8349479) added call to `code_string()` for Halt node mach node. > I am observing several more creation and clearing C strings collections during C2 compilation: > [17.405s][debug][codestrings] Clear 2 asm-remarks. > [17.405s][debug][codestrings] Clear 1 dbg-string. > > Most are coming from temporary scratch buffer C2 uses for code size calculation. I suggest to not collect strings in this buffer. > > Note, `CodeSection::set_scratch_emit()` is only called from [PhaseOutput::scratch_emit_size()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3368) for scratch buffer. > > I verified with `-XX:CompileCommand=print,::` that hsdis output is the same. > > Running on linux-x64 with fastdebug VM: > > before: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 1644 6576 80618 > > after again: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > It is more dramatic with `-Xcomp` we use for testing: > > Before > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 70924 283696 3533261 > > After fix > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > I was curious why it is 0 - we do deoptimize nmethod. But with big default CodeCache GC does not collect them. > Reducing CodeCache to 8Mb shows deallocation: > > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug -Xlog:codecache=debug -XX:+PrintCodeCache -XX:ReservedCodeCacheSize=8M com.sun.tools.javac.Main HelloWorld.java > ... > [40.196s][debug][codestrings] Clear 42 asm-remarks. > [40.196s][debug][codestrings] Clear 1 dbg-string. > [40.196s][debug][codecache ] *flushing nmethod 5811/0x00007f71d74b7188. Live blobs:3370/Free CodeCache:2953Kb > > > Tested tier1-5, Xcomp,comp-stress. Good catch! The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24893#pullrequestreview-2797997868 From duke at openjdk.org Mon Apr 28 06:02:45 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 28 Apr 2025 06:02:45 GMT Subject: RFR: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 03:47:51 GMT, Gui Cao wrote: >> Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> update vsubL_vx_masked format > > LGTM, https://github.com/openjdk/jdk/pull/24910 Waiting for this PR to merge in. @zifeihan thanks for the review and the fix in https://github.com/openjdk/jdk/pull/24910 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24904#issuecomment-2834062271 From duke at openjdk.org Mon Apr 28 06:03:45 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 28 Apr 2025 06:03:45 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v2] In-Reply-To: References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: On Mon, 28 Apr 2025 04:15:34 GMT, Gui Cao wrote: >> Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Code format Good fix! thanks! ------------- Marked as reviewed by Anjian-Wen at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24910#pullrequestreview-2798058408 From gcao at openjdk.org Mon Apr 28 06:06:44 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 06:06:44 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations In-Reply-To: References: Message-ID: <8jUivjnkSiFzpTEHEU1Qhg2217ysUCTN_NtPqKQscNQ=.be3bff1c-f614-420d-a078-a3e055a5e3a7@github.com> On Mon, 28 Apr 2025 02:09:35 GMT, Fei Yang wrote: > Hi, please review this change. > https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. > This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. > This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. > > Testing: > - [x] make test TEST="jdk_vector" (QEMU / fastdebug) > > JMH tested on BPI-F3 SBC for reference: > Before: > > ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms > LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms > LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms > LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms > LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms > > > After: > > ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.309 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 535.359 ? 15.765 ops/ms > LongMaxVector.UMAX ... LGTM. ------------- Marked as reviewed by gcao (Author). PR Review: https://git.openjdk.org/jdk/pull/24909#pullrequestreview-2798062768 From dfenacci at openjdk.org Mon Apr 28 06:21:58 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 28 Apr 2025 06:21:58 GMT Subject: RFR: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 10:20:55 GMT, Martin Doerr wrote: >> After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). >> Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). >> >> https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 >> >> To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. >> >> ### Testing >> >> Tier 1-3. >> No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). > > PPC64 parts look good and a few quick tests have passed. Thanks a lot @TheRealMDoerr, @offamitkumar, @RealFYang, @bulasevich for testing and reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24549#issuecomment-2834090488 From dfenacci at openjdk.org Mon Apr 28 06:21:58 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 28 Apr 2025 06:21:58 GMT Subject: Integrated: 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:04:35 GMT, Damon Fenacci wrote: > After [JDK-8347406](https://bugs.openjdk.org/browse/JDK-8347406), `OptoRuntime::generate_uncommon_trap_blob` and `OptoRuntime::generate_exception_blob` return an `UncommonTrapBlob`/`ExceptionBlob` if they succeed, `nullptr` if they don't. This is then used by the compiler to shut down gently if the code cache is full (instead of crashing). > Unfortunately if the the full code cache is reached when creating the buffer at the start of these 2 methods (when calling `CodeBuffer buffer(name, 2048, 1024);`) an empty buffer is created, which in turn prevents `masm` to be properly initialized, which then causes an access violation when writing into the blob's address when first adding `subptr` later in the method (as seen in the snippet below for `generate_uncommon_trap_blob`). > > https://github.com/openjdk/jdk/blob/3cc43b3224efdf1a3f35fff58b993027a9e1f4ad/src/hotspot/cpu/x86/runtime_x86_64.cpp#L55-L72 > > To fix this I suggest we return immediately from `OptoRuntime::generate_uncommon_trap_blob`/`OptoRuntime::generate_exception_blob` if the `buffer` creation failed. > > ### Testing > > Tier 1-3. > No specific regression test is added (very hard, i.a. dependent on thread scheduling. On the other hand `StartupOutput.java` might catch it rarely). This pull request has now been integrated. Changeset: 1f228e55 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/1f228e5539a5faa3b28e12548f8ad97eeacf3298 Stats: 36 lines in 8 files changed: 36 ins; 0 del; 0 mod 8354119: Missing C2 proper allocation failure handling during initialization (during generate_uncommon_trap_blob) Reviewed-by: kvn, chagedorn, mdoerr, amitkumar, fyang, bulasevich ------------- PR: https://git.openjdk.org/jdk/pull/24549 From epeter at openjdk.org Mon Apr 28 06:49:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 06:49:55 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 07:24:15 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2... Just a drive-by comment for now, I may review this later more fully. > I would also prefer if you added the IR restrictions rather than the JTREG requires. The benefit is that we can still run the tests on all platforms, at least for result verification. > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2798141911 From epeter at openjdk.org Mon Apr 28 06:49:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 06:49:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 10:09:48 GMT, erifan wrote: >> I don't see XorVMask implemented on all non-x86 target, like PPC etc.. > > This is not specifically required on x86, but because this test fails on x86 when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the sub-graph is like this: > `(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))` > It is not an optimization pattern supported by this patch because we don't know what's the comparison op. I would also prefer if you added the IR restrictions rather than the JTREG requires. The benefit is that we can still run the tests on all platforms, at least for result verification. Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2062988123 From duke at openjdk.org Mon Apr 28 07:18:52 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 28 Apr 2025 07:18:52 GMT Subject: Integrated: 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 07:17:36 GMT, Anjian-Wen wrote: > As the issue describe, some match rule and predict match not only type I, in case of the misleading, try to delete some "I" in the format and instruct name This pull request has now been integrated. Changeset: a05ff55b Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/a05ff55be4e4e1ab11d756b88a9dfa1f0adb4592 Stats: 81 lines in 1 file changed: 0 ins; 0 del; 81 mod 8355657: RISC-V: Improve PrintOptoAssembly output of vector-scalar instructions Reviewed-by: fyang, gcao ------------- PR: https://git.openjdk.org/jdk/pull/24904 From gcao at openjdk.org Mon Apr 28 07:28:34 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 07:28:34 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v3] In-Reply-To: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: > Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into JDK-8355668 - Code format - 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV ------------- Changes: https://git.openjdk.org/jdk/pull/24910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=02 Stats: 28 lines in 1 file changed: 0 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24910/head:pull/24910 PR: https://git.openjdk.org/jdk/pull/24910 From gcao at openjdk.org Mon Apr 28 07:31:00 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 07:31:00 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v4] In-Reply-To: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: > Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Revert instruct name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24910/files - new: https://git.openjdk.org/jdk/pull/24910/files/ed5c6842..80e9c6f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24910/head:pull/24910 PR: https://git.openjdk.org/jdk/pull/24910 From fyang at openjdk.org Mon Apr 28 07:36:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 07:36:48 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v4] In-Reply-To: References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: On Mon, 28 Apr 2025 07:31:00 GMT, Gui Cao wrote: >> Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Revert instruct name Good catch! One minor comment. Thanks. src/hotspot/cpu/riscv/riscv_v.ad line 1130: > 1128: (Matcher::vector_element_basic_type(n) == T_INT || > 1129: Matcher::vector_element_basic_type(n) == T_BYTE || > 1130: Matcher::vector_element_basic_type(n) == T_SHORT)); Can you reoder this a bit while you are on it? We always order in size of the type, that is, `T_BYTE`, `T_SHORT` and `T_INT`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24910#pullrequestreview-2798256737 PR Review Comment: https://git.openjdk.org/jdk/pull/24910#discussion_r2063057057 From duke at openjdk.org Mon Apr 28 07:48:55 2025 From: duke at openjdk.org (erifan) Date: Mon, 28 Apr 2025 07:48:55 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 06:45:58 GMT, Emanuel Peter wrote: >> This is not specifically required on x86, but because this test fails on x86 when `-XX:UseAVX=0` is specified. When `-XX:UseAVX=0` is specified, the sub-graph is like this: >> `(XorV (VectorMaskCmp (LoadVector ...)) (Replicate -1))` >> It is not an optimization pattern supported by this patch because we don't know what's the comparison op. > > I would also prefer if you added the IR restrictions rather than the JTREG requires. > The benefit is that we can still run the tests on all platforms, at least for result verification. > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. I can make the change, it's not complex, but it is different from what I thought before. I thought that supporting vector was the default behavior, is it right? So when I was doing an architecture-independent feature or optimization, I should just exclude those unsupported cases from the test, so that all potential environments would be tested. If I was doing an architecture- or feature-dependent optimization, then I should limit the test to run only in supported environments. For this case, **the current meaning of @requires is "skip this test when -XX:UseAVX=0 is specified on the x86 architecture, otherwise run the tests".** So if a new architecture (say s390) supports related vector operations in the future, then this test will be run on that platform by default. If all architecture-independent tests are restricted with applyIfCPUFeatureOr, then when we support a new architecture, we will need to modify all tests, otherwise no test will run on this architecture. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2063076640 From duke at openjdk.org Mon Apr 28 07:51:55 2025 From: duke at openjdk.org (erifan) Date: Mon, 28 Apr 2025 07:51:55 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 06:47:38 GMT, Emanuel Peter wrote: > Just a drive-by comment for now, I may review this later more fully. > > > I would also prefer if you added the IR restrictions rather than the JTREG requires. > > The benefit is that we can still run the tests on all platforms, at least for result verification. > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. > > Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either. Thanks! The problem is that when a new platform is added, people may not even know there is a test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834280547 From gcao at openjdk.org Mon Apr 28 08:02:34 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 08:02:34 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v5] In-Reply-To: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: > Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Fix for RealYang comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24910/files - new: https://git.openjdk.org/jdk/pull/24910/files/80e9c6f7..b1a229b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24910&range=03-04 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24910/head:pull/24910 PR: https://git.openjdk.org/jdk/pull/24910 From fyang at openjdk.org Mon Apr 28 08:02:35 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 08:02:35 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v5] In-Reply-To: References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: <_pSKRJiYsI7UCOsOG0I967_Fnjyv7qyNnahLFmYQBxU=.44dd0daa-562f-4bcf-b269-fb21fee29fcb@github.com> On Mon, 28 Apr 2025 07:58:56 GMT, Gui Cao wrote: >> Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix for RealYang comment Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24910#pullrequestreview-2798327824 From gcao at openjdk.org Mon Apr 28 08:02:35 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 08:02:35 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v4] In-Reply-To: References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: On Mon, 28 Apr 2025 07:33:54 GMT, Fei Yang wrote: > Can you reoder this a bit while you are on it? We always order in size of the type, that is, `T_BYTE`, `T_SHORT` and `T_INT`. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24910#discussion_r2063092943 From gcao at openjdk.org Mon Apr 28 08:15:01 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 08:15:01 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v2] In-Reply-To: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: <7Z3FpvSM_IvN0TeG3YY39v8gVkK1-QUoO8YPLOCAyK0=.58d9542b-8f57-40ad-9fec-6f95fd146cc2@github.com> > Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8355654 - 8355654: RISC-V: Relax register constraint for some vector-scalar instructions ------------- Changes: https://git.openjdk.org/jdk/pull/24905/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=01 Stats: 66 lines in 1 file changed: 0 ins; 0 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/24905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24905/head:pull/24905 PR: https://git.openjdk.org/jdk/pull/24905 From gcao at openjdk.org Mon Apr 28 08:27:00 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 08:27:00 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v3] In-Reply-To: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: > Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Code format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24905/files - new: https://git.openjdk.org/jdk/pull/24905/files/815c9eec..c00c40fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=01-02 Stats: 57 lines in 1 file changed: 0 ins; 0 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/24905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24905/head:pull/24905 PR: https://git.openjdk.org/jdk/pull/24905 From duke at openjdk.org Mon Apr 28 08:28:18 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 28 Apr 2025 08:28:18 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal Message-ID: This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. Testing: - [x] tier1 through tier4 on x86 plus Oracle internal testing ------------- Commit messages: - Clean up 32-bit x86 code in nativeInst_x86.* Changes: https://git.openjdk.org/jdk/pull/24911/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24911&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355472 Stats: 82 lines in 2 files changed: 0 ins; 81 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24911.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24911/head:pull/24911 PR: https://git.openjdk.org/jdk/pull/24911 From shade at openjdk.org Mon Apr 28 08:36:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 08:36:45 GMT Subject: RFR: 8355635: Do not collect C strings in C2 scratch buffer In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 02:39:58 GMT, Vladimir Kozlov wrote: > [JDK-8349479](https://bugs.openjdk.org/browse/JDK-8349479) added call to `code_string()` for Halt node mach node. > I am observing several more creation and clearing C strings collections during C2 compilation: > [17.405s][debug][codestrings] Clear 2 asm-remarks. > [17.405s][debug][codestrings] Clear 1 dbg-string. > > Most are coming from temporary scratch buffer C2 uses for code size calculation. I suggest to not collect strings in this buffer. > > Note, `CodeSection::set_scratch_emit()` is only called from [PhaseOutput::scratch_emit_size()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3368) for scratch buffer. > > I verified with `-XX:CompileCommand=print,::` that hsdis output is the same. > > Running on linux-x64 with fastdebug VM: > > before: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 1644 6576 80618 > > after again: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > It is more dramatic with `-Xcomp` we use for testing: > > Before > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 70924 283696 3533261 > > After fix > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > I was curious why it is 0 - we do deoptimize nmethod. But with big default CodeCache GC does not collect them. > Reducing CodeCache to 8Mb shows deallocation: > > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug -Xlog:codecache=debug -XX:+PrintCodeCache -XX:ReservedCodeCacheSize=8M com.sun.tools.javac.Main HelloWorld.java > ... > [40.196s][debug][codestrings] Clear 42 asm-remarks. > [40.196s][debug][codestrings] Clear 1 dbg-string. > [40.196s][debug][codecache ] *flushing nmethod 5811/0x00007f71d74b7188. Live blobs:3370/Free CodeCache:2953Kb > > > Tested tier1-5, Xcomp,comp-stress. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24893#pullrequestreview-2798485762 From rcastanedalo at openjdk.org Mon Apr 28 08:41:47 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 28 Apr 2025 08:41:47 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 08:23:25 GMT, Manuel H?ssig wrote: > This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. > > Testing: > - [x] tier1 through tier4 on x86 plus Oracle internal testing Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24911#pullrequestreview-2798499048 From shade at openjdk.org Mon Apr 28 08:41:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 08:41:47 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal In-Reply-To: References: Message-ID: <0d7VNFfri5bHTa0oSmDh16dodUKHIbEVTsAUd_k5yNg=.7d67e515-7592-4ab3-9b32-d0dceff94619@github.com> On Mon, 28 Apr 2025 08:23:25 GMT, Manuel H?ssig wrote: > This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. > > Testing: > - [x] tier1 through tier4 on x86 plus Oracle internal testing Looks good! I have a tiny readability suggestion. Feel free to ignore it: src/hotspot/cpu/x86/nativeInst_x86.cpp line 305: > 303: // make sure code pattern is actually a mov [reg+offset], reg instruction > 304: u_char test_byte = *(u_char*)instruction_address(); > 305: if (!((test_byte == lea_instruction_code) || (test_byte == mov64_instruction_code))) { Suggestion: if ((test_byte != lea_instruction_code) && (test_byte != mov64_instruction_code)) { ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24911#pullrequestreview-2798506760 PR Review Comment: https://git.openjdk.org/jdk/pull/24911#discussion_r2063189018 From dfenacci at openjdk.org Mon Apr 28 08:43:50 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 28 Apr 2025 08:43:50 GMT Subject: RFR: 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' In-Reply-To: References: Message-ID: <-aAGM8mfE-6t97p5kUud_OZbhUBxOH94nFDn6RiSyh8=.77b8e10e-1a51-4faf-893e-0ff53578cceb@github.com> On Thu, 24 Apr 2025 06:55:54 GMT, Marc Chevalier wrote: > We have a UB when the shift is equal to or bigger than the number of bits in > the type. Our expression is > > (julong)CONST64(1) << (julong)(BitsPerJavaLong - shift) > > so we have a UB when the RHS is `>= 64`, that is when `shift` is `<= 0`. Since shift is masked to > be in `[0, BitPerJavaLong - 1]`, we actually have a UB when `shift == 0`. The > code doesn't forbid it, indeed, and it doesn't seem to be enforced by more global > invariants. > > This UB doesn't reproduce anymore with the provided cases. > I've replaced the UB with an explicit assert to try to find another failing > case. No hit when run with tier1, tier2, tier3, hs-precheckin-comp and hs-comp-stress. > > Nevertheless, the assert indeed hit on the master of when the issue was filed. > More precisely, I've bisect for the two tests `java/foreign/StdLibTest.java` > and `java/lang/invoke/PermuteArgsTest.java` and the assert hits until > [8339196: Optimize BufWriterImpl#writeU1/U2/Int/Long](https://bugs.openjdk.org/browse/JDK-8339196). > > It is not clear to me why the issue stopped reproducing after this commit, but given > the lack of reproducer, I went with a semi-blind fix: it fixes the issue back then, > and still removes a chance of UB. It simply makes sure the RHS of this shift cannot be > 64 by making sure `shift` cannot be 0. > > If `shift` is indeed 0, since it is the RHS of a `RShiftLNode`, `RShiftLNode::Identity` > should simply returns the LHS of the shift, and entirely eliminate the RShiftLNode. > > The implementation of `AndINode::Ideal` is, on the other hand, safe. Indeed, it uses > `right_n_bits(BitsPerJavaInteger - shift)` instead of doing manually > `(1 << (BitsPerJavaInteger - shift)) - 1`. This macro is safe as it would return `-1` if > the shift is too large rather than causing a UB. Yet, I didn't use this way since it would > cause the replacement of `AndI(X, RShiftI(Y, 0))` by `AndI(X, URShiftI(Y, 0))` before > simplifying the `URShiftI` into `Y`. In between, it also implies that all users of the > And node will be enqueued for IGVN for a not-very-interesting change. Simply skipping > the replacement of RShiftL into URShiftL allows to directly come to `AndL(X, Y)` without > useless steps. > > Thanks, > Marc This seems the most sensible fix to me too. Thanks for fixing it @marc-chevalier. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24841#pullrequestreview-2798512954 From iveresov at openjdk.org Mon Apr 28 08:48:14 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 28 Apr 2025 08:48:14 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v3] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Fix class filtering ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/fea6171e..2a490c66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=01-02 Stats: 7 lines in 2 files changed: 5 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From gcao at openjdk.org Mon Apr 28 08:49:54 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 28 Apr 2025 08:49:54 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v4] In-Reply-To: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: > Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Polishing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24905/files - new: https://git.openjdk.org/jdk/pull/24905/files/c00c40fb..4fa2fc89 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24905/head:pull/24905 PR: https://git.openjdk.org/jdk/pull/24905 From jwaters at openjdk.org Mon Apr 28 09:10:51 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 28 Apr 2025 09:10:51 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 08:23:25 GMT, Manuel H?ssig wrote: > This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. > > Testing: > - [x] tier1 through tier4 on x86 plus Oracle internal testing Looks good ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/24911#pullrequestreview-2798610740 From jwaters at openjdk.org Mon Apr 28 09:12:48 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 28 Apr 2025 09:12:48 GMT Subject: RFR: 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 14:45:48 GMT, Joel Sikstr?m wrote: > Hi, > > Working on a patch close to this area and saw that the ifdef didn't match the "#endif" just below. The ifdef should be COMPILER1 instead of COMPILER2. Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24876#pullrequestreview-2798615715 From iveresov at openjdk.org Mon Apr 28 09:15:28 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 28 Apr 2025 09:15:28 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v4] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Merge branch 'master' into pp2 - Fix class filtering - Remove the workaround of setting AOTRecordTraining during assembly - Address some of the review comments - Merge branch 'master' into pp - Add AOTCompileEagerly flag to control compilation after clinit - Port 8355334: [leyden] Missing type profile info in archived training data - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation - Use ENABLE_IF macro - Missing part of the last commit - ... and 22 more: https://git.openjdk.org/jdk/compare/2447b981...7fb7ae62 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=03 Stats: 3185 lines in 57 files changed: 2960 ins; 103 del; 122 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From epeter at openjdk.org Mon Apr 28 09:20:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 09:20:46 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 07:48:58 GMT, erifan wrote: > > Just a drive-by comment for now, I may review this later more fully. > > > I would also prefer if you added the IR restrictions rather than the JTREG requires. > > > The benefit is that we can still run the tests on all platforms, at least for result verification. > > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. > > > > > > Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either. > > Thanks! The problem is that when a new platform is added, people may not even know there is a test. @erifan That is true. But we have that problem either way. If you use `@require`, then the person does not realize there is a test AND the test is not run. If you use `applyIf`, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834565901 From fyang at openjdk.org Mon Apr 28 09:21:45 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 09:21:45 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v4] In-Reply-To: References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: On Mon, 28 Apr 2025 08:49:54 GMT, Gui Cao wrote: >> Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing Looks fine. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24905#pullrequestreview-2798643117 From galder at openjdk.org Mon Apr 28 09:29:50 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 28 Apr 2025 09:29:50 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 24 Apr 2025 09:33:54 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Emanuel Peter Changes requested by galder (Author). src/hotspot/share/opto/library_call.cpp line 5554: > 5552: if (proj->_con == TypeFunc::Memory) { > 5553: int alias_idx = C->get_alias_index(proj->adr_type()); > 5554: assert(alias_idx == Compile::AliasIdxRaw || alias_idx == elemidx || alias_idx == mark_idx || alias_idx == klass_idx, "should be raw memory or array element type"); Shouldn't this `assert` be wrapped around an `#ifdef ASSERT` section? ------------- PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-2798649209 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2063266400 From fyang at openjdk.org Mon Apr 28 09:38:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Apr 2025 09:38:16 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations [v2] In-Reply-To: References: Message-ID: > Hi, please review this change. > https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. > This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. > This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. > > Testing: > - [x] make test TEST="jdk_vector" (QEMU / fastdebug) > > JMH tested on BPI-F3 (256 bit VLEN) SBC for reference: > Before: > > ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms > LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms > LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms > LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms > LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms > > > After: > > ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.309 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 535.359 ? 15.765 ops/ms > Long... Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8355667 - 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24909/files - new: https://git.openjdk.org/jdk/pull/24909/files/59379015..b4d876f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24909&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24909&range=00-01 Stats: 2025 lines in 165 files changed: 473 ins; 323 del; 1229 mod Patch: https://git.openjdk.org/jdk/pull/24909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24909/head:pull/24909 PR: https://git.openjdk.org/jdk/pull/24909 From duke at openjdk.org Mon Apr 28 09:53:46 2025 From: duke at openjdk.org (erifan) Date: Mon, 28 Apr 2025 09:53:46 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 09:17:58 GMT, Emanuel Peter wrote: > > > Just a drive-by comment for now, I may review this later more fully. > > > > I would also prefer if you added the IR restrictions rather than the JTREG requires. > > > > The benefit is that we can still run the tests on all platforms, at least for result verification. > > > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. > > > > > > > > > Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either. > > > > > > Thanks! The problem is that when a new platform is added, people may not even know there is a test. > > @erifan That is true. But we have that problem either way. If you use `@require`, then the person does not realize there is a test AND the test is not run. If you use `applyIf`, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2834650280 From duke at openjdk.org Mon Apr 28 12:15:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 28 Apr 2025 12:15:00 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: > This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. > > Testing: > - [x] tier1 through tier4 on x86 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply readability suggestion Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24911/files - new: https://git.openjdk.org/jdk/pull/24911/files/4bbfc501..f2382dce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24911&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24911&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24911.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24911/head:pull/24911 PR: https://git.openjdk.org/jdk/pull/24911 From duke at openjdk.org Mon Apr 28 12:15:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 28 Apr 2025 12:15:00 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal [v2] In-Reply-To: <0d7VNFfri5bHTa0oSmDh16dodUKHIbEVTsAUd_k5yNg=.7d67e515-7592-4ab3-9b32-d0dceff94619@github.com> References: <0d7VNFfri5bHTa0oSmDh16dodUKHIbEVTsAUd_k5yNg=.7d67e515-7592-4ab3-9b32-d0dceff94619@github.com> Message-ID: On Mon, 28 Apr 2025 08:38:48 GMT, Aleksey Shipilev wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply readability suggestion >> >> Co-authored-by: Aleksey Shipil?v > > src/hotspot/cpu/x86/nativeInst_x86.cpp line 305: > >> 303: // make sure code pattern is actually a mov [reg+offset], reg instruction >> 304: u_char test_byte = *(u_char*)instruction_address(); >> 305: if (!((test_byte == lea_instruction_code) || (test_byte == mov64_instruction_code))) { > > Suggestion: > > if ((test_byte != lea_instruction_code) && (test_byte != mov64_instruction_code)) { That looks much nicer. Thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24911#discussion_r2063528814 From shade at openjdk.org Mon Apr 28 13:18:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 13:18:49 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:15:00 GMT, Manuel H?ssig wrote: >> This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. >> >> Testing: >> - [x] tier1 through tier4 on x86 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply readability suggestion > > Co-authored-by: Aleksey Shipil?v Ship it. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24911#pullrequestreview-2799265785 From mchevalier at openjdk.org Mon Apr 28 13:19:47 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 28 Apr 2025 13:19:47 GMT Subject: RFR: 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' In-Reply-To: References: Message-ID: <3E-JQ5uGZ4d1JsgVr1fXkEYK2noY5JNYTHWR9pZfpko=.1ff4d08f-17cf-4688-93e4-0c1348f144e9@github.com> On Thu, 24 Apr 2025 06:55:54 GMT, Marc Chevalier wrote: > We have a UB when the shift is equal to or bigger than the number of bits in > the type. Our expression is > > (julong)CONST64(1) << (julong)(BitsPerJavaLong - shift) > > so we have a UB when the RHS is `>= 64`, that is when `shift` is `<= 0`. Since shift is masked to > be in `[0, BitPerJavaLong - 1]`, we actually have a UB when `shift == 0`. The > code doesn't forbid it, indeed, and it doesn't seem to be enforced by more global > invariants. > > This UB doesn't reproduce anymore with the provided cases. > I've replaced the UB with an explicit assert to try to find another failing > case. No hit when run with tier1, tier2, tier3, hs-precheckin-comp and hs-comp-stress. > > Nevertheless, the assert indeed hit on the master of when the issue was filed. > More precisely, I've bisect for the two tests `java/foreign/StdLibTest.java` > and `java/lang/invoke/PermuteArgsTest.java` and the assert hits until > [8339196: Optimize BufWriterImpl#writeU1/U2/Int/Long](https://bugs.openjdk.org/browse/JDK-8339196). > > It is not clear to me why the issue stopped reproducing after this commit, but given > the lack of reproducer, I went with a semi-blind fix: it fixes the issue back then, > and still removes a chance of UB. It simply makes sure the RHS of this shift cannot be > 64 by making sure `shift` cannot be 0. > > If `shift` is indeed 0, since it is the RHS of a `RShiftLNode`, `RShiftLNode::Identity` > should simply returns the LHS of the shift, and entirely eliminate the RShiftLNode. > > The implementation of `AndINode::Ideal` is, on the other hand, safe. Indeed, it uses > `right_n_bits(BitsPerJavaInteger - shift)` instead of doing manually > `(1 << (BitsPerJavaInteger - shift)) - 1`. This macro is safe as it would return `-1` if > the shift is too large rather than causing a UB. Yet, I didn't use this way since it would > cause the replacement of `AndI(X, RShiftI(Y, 0))` by `AndI(X, URShiftI(Y, 0))` before > simplifying the `URShiftI` into `Y`. In between, it also implies that all users of the > And node will be enqueued for IGVN for a not-very-interesting change. Simply skipping > the replacement of RShiftL into URShiftL allows to directly come to `AndL(X, Y)` without > useless steps. > > Thanks, > Marc Thanks @dean-long and @dafedafe for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24841#issuecomment-2835208391 From duke at openjdk.org Mon Apr 28 13:19:47 2025 From: duke at openjdk.org (duke) Date: Mon, 28 Apr 2025 13:19:47 GMT Subject: RFR: 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' In-Reply-To: References: Message-ID: <4IyjK-TFDI4CZGQItGqBM1rjcUsB48zRHhYojw_VL0o=.d72a59c9-faf9-4c67-8d08-fa295aa3aa41@github.com> On Thu, 24 Apr 2025 06:55:54 GMT, Marc Chevalier wrote: > We have a UB when the shift is equal to or bigger than the number of bits in > the type. Our expression is > > (julong)CONST64(1) << (julong)(BitsPerJavaLong - shift) > > so we have a UB when the RHS is `>= 64`, that is when `shift` is `<= 0`. Since shift is masked to > be in `[0, BitPerJavaLong - 1]`, we actually have a UB when `shift == 0`. The > code doesn't forbid it, indeed, and it doesn't seem to be enforced by more global > invariants. > > This UB doesn't reproduce anymore with the provided cases. > I've replaced the UB with an explicit assert to try to find another failing > case. No hit when run with tier1, tier2, tier3, hs-precheckin-comp and hs-comp-stress. > > Nevertheless, the assert indeed hit on the master of when the issue was filed. > More precisely, I've bisect for the two tests `java/foreign/StdLibTest.java` > and `java/lang/invoke/PermuteArgsTest.java` and the assert hits until > [8339196: Optimize BufWriterImpl#writeU1/U2/Int/Long](https://bugs.openjdk.org/browse/JDK-8339196). > > It is not clear to me why the issue stopped reproducing after this commit, but given > the lack of reproducer, I went with a semi-blind fix: it fixes the issue back then, > and still removes a chance of UB. It simply makes sure the RHS of this shift cannot be > 64 by making sure `shift` cannot be 0. > > If `shift` is indeed 0, since it is the RHS of a `RShiftLNode`, `RShiftLNode::Identity` > should simply returns the LHS of the shift, and entirely eliminate the RShiftLNode. > > The implementation of `AndINode::Ideal` is, on the other hand, safe. Indeed, it uses > `right_n_bits(BitsPerJavaInteger - shift)` instead of doing manually > `(1 << (BitsPerJavaInteger - shift)) - 1`. This macro is safe as it would return `-1` if > the shift is too large rather than causing a UB. Yet, I didn't use this way since it would > cause the replacement of `AndI(X, RShiftI(Y, 0))` by `AndI(X, URShiftI(Y, 0))` before > simplifying the `URShiftI` into `Y`. In between, it also implies that all users of the > And node will be enqueued for IGVN for a not-very-interesting change. Simply skipping > the replacement of RShiftL into URShiftL allows to directly come to `AndL(X, Y)` without > useless steps. > > Thanks, > Marc @marc-chevalier Your change (at version 96efa3eac25c88aa032c8bb8978f984185d63414) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24841#issuecomment-2835211914 From rcastanedalo at openjdk.org Mon Apr 28 13:23:48 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 28 Apr 2025 13:23:48 GMT Subject: RFR: 8327963: [Umbrella] Incorrect result of C2 compiled code since JDK-8237581 [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 24 Apr 2025 09:33:54 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Emanuel Peter Thanks for working on this, Roland! A "dumb" question: could the issue also be addressed by ensuring that dead allocations are removed earlier (e.g. in the call to `PhaseMacroExpand::eliminate_allocate_node` performed as part of escape analysis/scalar replacement, before loop optimizations)? It seems this would also prevent the miscompilations in `TestEliminationOfAllocationWithoutUse`, no? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2835226115 From duke at openjdk.org Mon Apr 28 13:26:57 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 28 Apr 2025 13:26:57 GMT Subject: RFR: 8352620: C2: rename MemNode::memory_type() to MemNode::value_basic_type() [v3] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 09:42:01 GMT, Saranya Natarajan wrote: >> Description: The current name MemNode::memory_type() is misleading because the returned type is a property of the value that is loaded/stored, not the memory that is accessed. Usually, the two of them match, but for mismatched memory accesses (arising e.g. from using Unsafe or memory segments) they might differ, e.g. one might store a value of type 'short' into an array of elements of type 'long'. The proposal was to rename MemNode::memory_type() to MemNode::value_basic_type() to clarify these cases. >> >> Solution: Replaced all occurrence of MemNode::memory_type() with MemNode::value_basic_type() > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments Thank you for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24427#issuecomment-2835239103 From kvn at openjdk.org Mon Apr 28 13:32:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 28 Apr 2025 13:32:55 GMT Subject: Integrated: 8355635: Do not collect C strings in C2 scratch buffer In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 02:39:58 GMT, Vladimir Kozlov wrote: > [JDK-8349479](https://bugs.openjdk.org/browse/JDK-8349479) added call to `code_string()` for Halt node mach node. > I am observing several more creation and clearing C strings collections during C2 compilation: > [17.405s][debug][codestrings] Clear 2 asm-remarks. > [17.405s][debug][codestrings] Clear 1 dbg-string. > > Most are coming from temporary scratch buffer C2 uses for code size calculation. I suggest to not collect strings in this buffer. > > Note, `CodeSection::set_scratch_emit()` is only called from [PhaseOutput::scratch_emit_size()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3368) for scratch buffer. > > I verified with `-XX:CompileCommand=print,::` that hsdis output is the same. > > Running on linux-x64 with fastdebug VM: > > before: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 1644 6576 80618 > > after again: > $ java -XX:-TieredCompilation -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > It is more dramatic with `-Xcomp` we use for testing: > > Before > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 70924 283696 3533261 > > After fix > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug com.sun.tools.javac.Main HelloWorld.java | grep codestrings |wc > 0 0 0 > > > I was curious why it is 0 - we do deoptimize nmethod. But with big default CodeCache GC does not collect them. > Reducing CodeCache to 8Mb shows deallocation: > > $ java -XX:-TieredCompilation -Xcomp -Xlog:codestrings=debug -Xlog:codecache=debug -XX:+PrintCodeCache -XX:ReservedCodeCacheSize=8M com.sun.tools.javac.Main HelloWorld.java > ... > [40.196s][debug][codestrings] Clear 42 asm-remarks. > [40.196s][debug][codestrings] Clear 1 dbg-string. > [40.196s][debug][codecache ] *flushing nmethod 5811/0x00007f71d74b7188. Live blobs:3370/Free CodeCache:2953Kb > > > Tested tier1-5, Xcomp,comp-stress. This pull request has now been integrated. Changeset: 3eaec040 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/3eaec040b4e82e1a31bd12683dd783a33025d1bf Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8355635: Do not collect C strings in C2 scratch buffer Reviewed-by: jrose, thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/24893 From erikj at openjdk.org Mon Apr 28 13:35:52 2025 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 28 Apr 2025 13:35:52 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v4] In-Reply-To: References: Message-ID: <8Sz4UCfCW_5NTFSp3BuIe_V5F4Dh6aikcgdTOtwm18g=.c4d1fee3-c21a-4221-a075-dbebc7a5be82@github.com> On Mon, 28 Apr 2025 09:15:28 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into pp2 > - Fix class filtering > - Remove the workaround of setting AOTRecordTraining during assembly > - Address some of the review comments > - Merge branch 'master' into pp > - Add AOTCompileEagerly flag to control compilation after clinit > - Port 8355334: [leyden] Missing type profile info in archived training data > - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation > - Use ENABLE_IF macro > - Missing part of the last commit > - ... and 22 more: https://git.openjdk.org/jdk/compare/2447b981...7fb7ae62 Build changes look trivially good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24886#issuecomment-2835265306 From duke at openjdk.org Mon Apr 28 13:44:55 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 13:44:55 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:59:03 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Remove unused code >> - Move code to addnode.cpp and add more tests >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 > > src/hotspot/share/opto/addnode.cpp line 916: > >> 914: >> 915: bool MergePrimitiveLoads::is_supported_combine_opcode(int opc) { >> 916: return opc == Op_OrI || opc == Op_OrL; > > Ah, and here you do it "positively". I would also recommend using a switch case, it is nicer to extend later for AND and XOR. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2063688262 From duke at openjdk.org Mon Apr 28 13:44:56 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 13:44:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:43:23 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 1877: > >> 1875: _last_op(false) {} >> 1876: void set_last_op(bool v) { _last_op = v; } >> 1877: bool last_op() const { return _last_op; } > > Is it feasible to make all fields `const`? It can often make reasoning about code easier if you know that there can be no modifications. Not sure about `_last_op`, I'll have to keep reading to find out. When I change MergeLoadInfoList as GrowableArray, I need copy info into array, I added `operator=` and copy constructor. So these fields can not be const. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2063685689 From duke at openjdk.org Mon Apr 28 13:50:50 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 13:50:50 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). @eme64 , I fixed as your comments. Could you review it again? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2835310467 From duke at openjdk.org Mon Apr 28 13:50:51 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 28 Apr 2025 13:50:51 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: <56jHI-HsjxkhkEQ5Dciu-thfFFPCErUEdzuzHr5k7HA=.168c6cdc-52b5-4c96-9068-feb00f621e61@github.com> References: <56jHI-HsjxkhkEQ5Dciu-thfFFPCErUEdzuzHr5k7HA=.168c6cdc-52b5-4c96-9068-feb00f621e61@github.com> Message-ID: On Tue, 22 Apr 2025 08:53:06 GMT, kuaiwei wrote: >> Also: I would change the name of the method if we keep it. It is really about checking down, i.e. that there is not other candidate below. Maybe `has_no_merge_load_combine_below`? >> >> Ah, another example: We could have two merged loads that we OR: >> >> int x = (... merge load pattern with OR ...); >> int y = (... merge load pattern with OR ...); >> int z = x | y; >> >> It would be nice if this could be optimized too, and we should have an IR test for it :) > > Yes, it's the limit of this implementation. I need to find the last `combine` node which can be replaced with merged load. But if it's used by other `Or` operator. So far I can not find a good way to distinguish these two cases. > I may add a new `checked` flag for combine operator. For case like: > > int x = (... merge load pattern with OR ...); > int y = (... merge load pattern with OR ...); > int z = x | y; > > When IGVN check the `Or` in `x | y`, it's the last one of combine nodes. But it will fail to merge because `collect_merge_list` can not find a related `Load` for it. And I can mark it as `checked`. So when IGVN check the `Or` nodes in line 1 and line2. it will find the next `Or` is checked and get the right one. > > Do you think if it is doable? Other suggestion is appreciated. Thanks. I added a flag, `_merge_memops_checked` in `AddNode`. It will be checked when search `MergePrimitiveLoads::has_no_merge_load_combine_below()` . So these cases can be handled. And tests are added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2063697381 From thartmann at openjdk.org Mon Apr 28 13:58:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Apr 2025 13:58:27 GMT Subject: Integrated: 8355717: Problem list tests until JDK-8355708 is fixed Message-ID: Problem listing the tests on AArch64 until [JDK-8355708](https://bugs.openjdk.org/browse/JDK-8355708) is fixed. Thanks, Tobias ------------- Commit messages: - 8355717: Problem list tests until JDK-8355708 is fixed Changes: https://git.openjdk.org/jdk/pull/24921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355717 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24921/head:pull/24921 PR: https://git.openjdk.org/jdk/pull/24921 From chagedorn at openjdk.org Mon Apr 28 13:58:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Apr 2025 13:58:27 GMT Subject: Integrated: 8355717: Problem list tests until JDK-8355708 is fixed In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 13:38:59 GMT, Tobias Hartmann wrote: > Problem listing the tests on AArch64 until [JDK-8355708](https://bugs.openjdk.org/browse/JDK-8355708) is fixed. > > Thanks, > Tobias Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24921#pullrequestreview-2799354853 From thartmann at openjdk.org Mon Apr 28 13:58:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Apr 2025 13:58:27 GMT Subject: Integrated: 8355717: Problem list tests until JDK-8355708 is fixed In-Reply-To: References: Message-ID: <0GBDssJz7JiPiv3K2CpNaxQDwhEVoF4wXfWYhc92WCM=.cd86270f-8c9f-4a21-b4e5-40d40c1ecdc8@github.com> On Mon, 28 Apr 2025 13:38:59 GMT, Tobias Hartmann wrote: > Problem listing the tests on AArch64 until [JDK-8355708](https://bugs.openjdk.org/browse/JDK-8355708) is fixed. > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24921#issuecomment-2835327687 From thartmann at openjdk.org Mon Apr 28 13:58:28 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Apr 2025 13:58:28 GMT Subject: Integrated: 8355717: Problem list tests until JDK-8355708 is fixed In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 13:38:59 GMT, Tobias Hartmann wrote: > Problem listing the tests on AArch64 until [JDK-8355708](https://bugs.openjdk.org/browse/JDK-8355708) is fixed. > > Thanks, > Tobias This pull request has now been integrated. Changeset: e7a41625 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/e7a416254be88ad3af74d874e444a4921b2a31f7 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8355717: Problem list tests until JDK-8355708 is fixed Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24921 From epeter at openjdk.org Mon Apr 28 14:09:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 14:09:59 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 07:24:15 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2... test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: > 235: public static void testCompareEQMaskNotByte() { > 236: testCompareMaskNotByte(VectorOperators.EQ); > 237: } Another comment: you now only have "negative" tests, where you check for count `=0`. It would be good if you also had a positive rule here, one where you do see an XOR in a similar case, where your optimization does apply. This would basically be a "control" that checks that your are testing the right thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2063741513 From jsikstro at openjdk.org Mon Apr 28 14:11:55 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 28 Apr 2025 14:11:55 GMT Subject: RFR: 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 14:45:48 GMT, Joel Sikstr?m wrote: > Hi, > > Working on a patch close to this area and saw that the ifdef didn't match the "#endif" just below. The ifdef should be COMPILER1 instead of COMPILER2. Thank you for the reviews everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24876#issuecomment-2835375614 From jsikstro at openjdk.org Mon Apr 28 14:11:56 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 28 Apr 2025 14:11:56 GMT Subject: Integrated: 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 14:45:48 GMT, Joel Sikstr?m wrote: > Hi, > > Working on a patch close to this area and saw that the ifdef didn't match the "#endif" just below. The ifdef should be COMPILER1 instead of COMPILER2. This pull request has now been integrated. Changeset: 66358fa2 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/66358fa2c0074b02f6087f1e1501eff9364a25f2 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8355616: Incorrect ifdef in compilationMemoryStatistic.cpp Reviewed-by: shade, stuefe, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/24876 From dzhang at openjdk.org Mon Apr 28 14:12:47 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 28 Apr 2025 14:12:47 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v4] In-Reply-To: References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: <6sRU3HD11yPkQNy7h4YdcIShr5ck-rOpJL9F77Q3_Nc=.69cf8401-d6d9-4e36-b4d2-71e9d8cef422@github.com> On Mon, 28 Apr 2025 08:49:54 GMT, Gui Cao wrote: >> Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Polishing LGTM, thanks! ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/24905#pullrequestreview-2799451046 From epeter at openjdk.org Mon Apr 28 14:16:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 14:16:53 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: <41Y44tSReVYSBYslzzcs-GGyQ6SPM2CfTkJ3-MpGBYM=.cbb47073-5d30-4391-9d67-b2fa758f4dca@github.com> On Mon, 28 Apr 2025 09:51:10 GMT, erifan wrote: > > > > Just a drive-by comment for now, I may review this later more fully. > > > > > I would also prefer if you added the IR restrictions rather than the JTREG requires. > > > > > The benefit is that we can still run the tests on all platforms, at least for result verification. > > > > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. > > > > > > > > > > > > Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either. > > > > > > > > > Thanks! The problem is that when a new platform is added, people may not even know there is a test. > > > > > > @erifan That is true. But we have that problem either way. If you use `@require`, then the person does not realize there is a test AND the test is not run. If you use `applyIf`, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. > > This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms. I see. You should probably add a comment there, to say that you are only excluding `AVX=0`. But even `UseAVX = 0` would profit from result verification. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2835397926 From epeter at openjdk.org Mon Apr 28 14:16:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 14:16:53 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 07:46:14 GMT, erifan wrote: >> I would also prefer if you added the IR restrictions rather than the JTREG requires. >> The benefit is that we can still run the tests on all platforms, at least for result verification. >> >> Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. > > I can make the change, it's not complex, but it is different from what I thought before. > > I thought that supporting vector was the default behavior, is it right? So when I was doing an architecture-independent feature or optimization, I should just exclude those unsupported cases from the test, so that all potential environments would be tested. If I was doing an architecture- or feature-dependent optimization, then I should limit the test to run only in supported environments. > > For this case, **the current meaning of @requires is "skip this test when -XX:UseAVX=0 is specified on the x86 architecture, otherwise run the tests".** So if a new architecture (say s390) supports related vector operations in the future, then this test will be run on that platform by default. > > If all architecture-independent tests are restricted with applyIfCPUFeatureOr, then when we support a new architecture, we will need to modify all tests, otherwise no test will run on this architecture. Hmm, I'm not convinced that vector operations are supported by default. Every platform supports a different set. In your case, you so far only have negative rules, i.e. you expect certain nodes NOT to be generated / in the final IR. I suppose in that case you can assert that you NEVER get those nodes, because if you have vectors not supported, they will not show up because of that, and if you do support vectors, they should be optimized away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2063752141 From epeter at openjdk.org Mon Apr 28 14:16:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Apr 2025 14:16:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 14:06:49 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393... > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: > >> 235: public static void testCompareEQMaskNotByte() { >> 236: testCompareMaskNotByte(VectorOperators.EQ); >> 237: } > > Another comment: you now only have "negative" tests, where you check for count `=0`. It would be good if you also had a positive rule here, one where you do see an XOR in a similar case, where your optimization does apply. > > This would basically be a "control" that checks that your are testing the right thing. Also: this test should vectorize on some plarforms, right? A compare, correct? Would it not be good to actively check that with an IR rule? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2063749337 From duke at openjdk.org Mon Apr 28 14:23:52 2025 From: duke at openjdk.org (Mohamed Issa) Date: Mon, 28 Apr 2025 14:23:52 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 01:06:55 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. >> >> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | >> | :-------------------: | :-----------------: | :----------------: | :-------------------------: | >> | [-1, 1] | 103342 | 103705 | +0.35 | >> | [-2, 2] | 99977 | 100819 | +0.84 | >> | [-10, 10] | 99147 | 100240 | +1.10 | >> | [-20, 20] | 99419 | 99492 |... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Create separate tanh micro-benchmark module to avoid noise in MathBench @TobiHartmann @vnkozlov Ok to run this through Oracle test framework before integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2835422011 From sparasa at openjdk.org Mon Apr 28 15:33:47 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 28 Apr 2025 15:33:47 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 20:04:10 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > cleanup ecmov, eorw and other refactoring Hi Sandhya (@sviswa7) and Jatin (@jatin-bhateja), Could you please review the refactored changes? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2835648617 From dlunden at openjdk.org Mon Apr 28 15:33:58 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 28 Apr 2025 15:33:58 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences Message-ID: The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. ### Changeset - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: - Clean up how we identify the search root (avoid mutation). - Add a missing early exit for `Phi` nodes when `LCA == early`. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. ------------- Commit messages: - Implement changeset Changes: https://git.openjdk.org/jdk/pull/24926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351568 Stats: 308 lines in 6 files changed: 169 ins; 29 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From galder at openjdk.org Mon Apr 28 15:45:52 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 28 Apr 2025 15:45:52 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: <-erDwDMbT780ArCHyfj3LTBs1FbMw-rr_EBKRUbhRz8=.5a30cc12-2091-4001-a650-446b92be1569@github.com> On Wed, 23 Apr 2025 15:12:54 GMT, Manuel H?ssig wrote: > ## Issue Summary > > The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. > > ## Change Summary > > To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. > > All changes of this PR summarized: > - add a regression test, > - update the relocation information after patching the method entry for making it not entrant, > - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. > > ## Testing > > I ran tiers 1 through 3 and Oracle internal testing. Changes requested by galder (Author). test/hotspot/jtreg/compiler/print/TestPrintAssemblyDeoptRace.java line 28: > 26: * @bug 8258229 > 27: * @summary If a method is made not entrant while prinint the assembly, hotspot crashes due to mismatched relocation information. > 28: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:CompileCommand=print,java/math/BitSieve.bit Shouldn't the test run with `-XX:+DeoptimizeALot`? That way we would get more confidence that the deoptimization support works as expected. ------------- PR Review: https://git.openjdk.org/jdk/pull/24831#pullrequestreview-2799801578 PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2063962347 From shade at openjdk.org Mon Apr 28 16:07:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:07:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Fri, 25 Apr 2025 21:17:08 GMT, Vladimir Ivanov wrote: >> Yes, this is about null (bootstrap) classloader; the system returns `nullptr` in this case. I don't think `UMH` gets to decide whether `!nullptr` holder is always alive or not, and it is safer to hold on to it. > >> I don't think UMH gets to decide whether !nullptr holder is always alive or not, and it is safer to hold on to it. > > I looked around and stumbled upon the following code in `ClassLoaderData` [1]. I haven't checked myself, but it looks like a hidden class injected into bootstrap loader has `klass_holder == nullptr` while still is amenable to GC... > > IMO a check for `ik->class_loader_data()->is_permanent_class_loader_data()` would do a better job serving the immediate needs and communicating the intentions. > > [1] > > bool ClassLoaderData::is_permanent_class_loader_data() const { > return is_builtin_class_loader_data() && !has_class_mirror_holder(); > } > > // Returns true if the class loader for this class loader data is one of > // the 3 builtin (boot application/system or platform) class loaders, > // including a user-defined system class loader. Note that if the class > // loader data is for a non-strong hidden class then it may > // get freed by a GC even if its class loader is one of these loaders. > bool ClassLoaderData::is_builtin_class_loader_data() const { > ... I remember looking at this, and convinced myself that non-strong hidden classes report their related Java mirror as `klass_holder`, and that is enough to maintain them as alive. See calls to `ClassLoaderData::initialize_holder`: https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/classLoaderData.cpp#L159-L162 https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/systemDictionary.cpp#L810-L814 This is what the comment in `UnloadableMethodHandle::get_unload_blocker` refers to. So I believe current code is correct. I agree `is_permanent_class_loader_data()` captures the intent better. Let me see if it fits well here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2064005189 From shade at openjdk.org Mon Apr 28 16:14:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:14:02 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v8] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8231269-compile-task-weaks - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - Inline guard - Merge branch 'master' into JDK-8231269-compile-task-weaks - Allow UMH::_method access from VMStructs - Fix VMStructs - Purge extra fluff - Touchups - ... and 6 more: https://git.openjdk.org/jdk/compare/2447b981...be3a3d62 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=07 Stats: 292 lines in 11 files changed: 243 ins; 25 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon Apr 28 16:20:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:20:34 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v9] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Simplify a bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/be3a3d62..eaf3f14d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon Apr 28 16:20:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:20:34 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Mon, 28 Apr 2025 16:05:06 GMT, Aleksey Shipilev wrote: > I agree is_permanent_class_loader_data() captures the intent better. Let me see if it fits well here. Ah wait, it does not. We need to hold on to something that blocks the unloading. Just checking `is_permanent_class_loader_data()` does not get us there. We would need to ask for some holder for it. For the reasons above, `method->method_holder()->klass_holder()` works for non-strong hidden classes as well. This is also why current mainline code works -- it captures the same thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2064032610 From kvn at openjdk.org Mon Apr 28 17:37:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 28 Apr 2025 17:37:54 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v4] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 09:15:28 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into pp2 > - Fix class filtering > - Remove the workaround of setting AOTRecordTraining during assembly > - Address some of the review comments > - Merge branch 'master' into pp > - Add AOTCompileEagerly flag to control compilation after clinit > - Port 8355334: [leyden] Missing type profile info in archived training data > - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation > - Use ENABLE_IF macro > - Missing part of the last commit > - ... and 22 more: https://git.openjdk.org/jdk/compare/2447b981...7fb7ae62 Looks better. There are still places where UL is used specifically for TD processing. Consider using `(aot, training)` there instead of `(cds)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2800131686 From iveresov at openjdk.org Mon Apr 28 18:20:48 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 28 Apr 2025 18:20:48 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v4] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 17:35:13 GMT, Vladimir Kozlov wrote: > Looks better. There are still places where UL is used specifically for TD processing. Consider using `(aot, training)` there instead of `(cds)`. Right. I haven't addressed these review comments yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24886#issuecomment-2836100964 From vlivanov at openjdk.org Mon Apr 28 18:51:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 28 Apr 2025 18:51:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Mon, 28 Apr 2025 16:16:41 GMT, Aleksey Shipilev wrote: >> I remember looking at this, and convinced myself that non-strong hidden classes report their related Java mirror as `klass_holder`, and that is enough to maintain them as alive. See calls to `ClassLoaderData::initialize_holder`: >> >> https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/classLoaderData.cpp#L159-L162 >> >> https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/systemDictionary.cpp#L810-L814 >> >> This is what the comment in `UnloadableMethodHandle::get_unload_blocker` refers to. So I believe current code is correct. >> >> I agree `is_permanent_class_loader_data()` captures the intent better. Let me see if it fits well here. > >> I agree is_permanent_class_loader_data() captures the intent better. Let me see if it fits well here. > > Ah wait, it does not. We need to hold on to something that blocks the unloading. Just checking `is_permanent_class_loader_data()` does not get us there. We would need to ask for some holder for it. For the reasons above, `method->method_holder()->klass_holder()` works for non-strong hidden classes as well. > > This is also why current mainline code works -- it captures the same thing. Ok, thanks for checking! Good to know there's no existing bug. What I had in mind is as follows: InstanceKlass* holder = method->method_holder(); if (holder->class_loader_data()->is_permanent_class_loader_data()) { return nullptr; // method holder class can't be unloaded } else { // Normal class, return the holder that would block unloading. // This would be either classloader oop for non-hidden classes, // or Java mirror oop for hidden classes. assert(holder->klass_holder() != nullptr, ""); return holder->klass_holder(); } IMO it makes the check more precise and, at the same time, communicates the intent better. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2064315717 From duke at openjdk.org Mon Apr 28 19:30:54 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 28 Apr 2025 19:30:54 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 19:10:27 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix branch range check > > src/hotspot/share/code/relocInfo.cpp line 379: > >> 377: } else { >> 378: // Reassert the callee address, this time in the new copy of the code. >> 379: pd_set_call_destination(callee); > > if (src->contains(callee)) { > // ... > int offset = pointer_delta_as_int(callee, orig_addr); > callee = addr() + offset; > } > pd_set_call_destination(callee); I'll use this refactor to remove the else but can't use `pointer_delta_as_int` as it only works for positive offsets ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2064380935 From ihse at openjdk.org Mon Apr 28 19:35:55 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 28 Apr 2025 19:35:55 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v4] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 09:15:28 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into pp2 > - Fix class filtering > - Remove the workaround of setting AOTRecordTraining during assembly > - Address some of the review comments > - Merge branch 'master' into pp > - Add AOTCompileEagerly flag to control compilation after clinit > - Port 8355334: [leyden] Missing type profile info in archived training data > - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation > - Use ENABLE_IF macro > - Missing part of the last commit > - ... and 22 more: https://git.openjdk.org/jdk/compare/2447b981...7fb7ae62 Build changes are trivially fine. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2800489948 From duke at openjdk.org Mon Apr 28 19:50:26 2025 From: duke at openjdk.org (Alexandre Jacob) Date: Mon, 28 Apr 2025 19:50:26 GMT Subject: RFR: 8350621: Code cache stops scheduling GC Message-ID: The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. Unfortunately this can't work properly under certain circumstances. For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). I have observed this behavior on JVM in version 21 that were migrated recently from java 17. Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. In order to reproduce this issue, I found a very simple and convenient way: public class CodeCacheMain { public static void main(String[] args) throws InterruptedException { while (true) { Thread.sleep(100); } } } Run this simple app with the following JVM flags: -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC - low `ReservedCodeCacheSize` to put pressure on code cache quickly - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: - allows us to monitor code cache - indirectly generate activity on the code cache, just what we need to reproduce the bug Some logs related to code cache will show up at some point with GC activity: [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory And then it will stop and we'll end up with the following message: [672.714s][info][codecache ] Code cache is full - disabling compilation Leaving the JVM in an unstable situation. I considered a few different options before making this change: 1) Always call `Universe::heap()->collect(...)` without making any check (the GC impl should handle the situation) 2) Fix all GCs implementation to ensure `_unloading_threshold_gc_requested` gets back to `false` at some point (probably what is supposed to happen today) 3) Change `CollectedHeap::collect` to return a `bool` instead of `void` to indicate if GC was run or scheduled But I discarded them: 1) Dumb option that I used to check that the bug would be corrected, but will probably put a bit of pressure on resources when allocation need to be performed at code cache level (as it will be called at each allocation attempt). In addition, the log indicating that we trigger GC is spammed, not easy to decide how to handle the log correctly. 2) This option is possible and was my favorite up to some point. GC's implementation can have quite a lot of branches and it can be difficult to ensure we don't forget a case when to reset the flag. This could eventually be a solution to be explored in addition to the solution I propose in the PR. We could introduce a static method in `CodeCache` that would let a GC implementation to just reset the flag in a case the GC will not actually run for example (to be discussed) 3) I explored this solution, but it adds quite a lot of changes and is risky in the long term (in my opinion). G1GC already has a [G1CollectedHeap::try_collect](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1870) method that returns a `bool`, but this bool is `true` even when the GC is not run. As a result, I decided to simply add a way for `CodeCache` to recover from this situation. The idea is to let the GC code as-is but keep in memory the time of the last GC request and reset the flag to `false` if it was not reset in a certain amount of time (250ms in my PR). This should only be helpful in corner cases where the GC impl has not reset the flag by itself. Among the advantages of this solution: it gives a security to recover from a situation that may be created by changes in GC implementation, because someone forgot to take care about code cache. I took a lot of time investigating this issue and exploring solutions, and am willing to take any input on it as it is my first PR on the project. ------------- Commit messages: - _unloading_gc_requested should remain volatile - remove early returns from gc_on_allocation - fix race condition in try_to_gc - log before GC - fix log message - XXXXXXX: Fix code cache GC Changes: https://git.openjdk.org/jdk/pull/23656/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23656&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350621 Stats: 77 lines in 2 files changed: 45 ins; 10 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23656/head:pull/23656 PR: https://git.openjdk.org/jdk/pull/23656 From duke at openjdk.org Mon Apr 28 19:50:26 2025 From: duke at openjdk.org (Alexandre Jacob) Date: Mon, 28 Apr 2025 19:50:26 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... Here is a log sample that shows how it behaves when the bug occurs. Logs starting with `>>>` are some logs I added when working on steps to reproduce. The bug occurs at ~648.762s, G1GC has reset the flag to `false` but is still running, CodeCache has called `Universe::heap()->collect(...)`, which was discarded because of the current GC routine. Note that during this test `-XX:StartAggressiveSweepingAt` was set to `25` instead of `15` but I confirm I can reproduce with `15` as well (as explained in the description of the PR) [648.733s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 0) [648.733s][info][codecache ] Triggering aggressive GC due to having only 24.970% free memory [648.733s][info][gc,start ] GC(6210) Pause Young (CodeCache GC Aggressive) [648.733s][info][gc,heap ] GC(6210) PSYoungGen: 2851K(132096K)->224K(132096K) Eden: 2851K(131584K)->0K(131584K) From: 0K(512K)->224K(512K) [648.733s][info][gc,heap ] GC(6210) ParOldGen: 11691K(349696K)->11691K(349696K) [648.733s][info][gc,metaspace ] GC(6210) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.733s][info][gc ] GC(6210) Pause Young (CodeCache GC Aggressive) 14M->11M(470M) 0.238ms [648.733s][info][gc,cpu ] GC(6210) User=0.00s Sys=0.00s Real=0.00s [648.733s][info][gc,start ] GC(6211) Pause Full (CodeCache GC Aggressive) [648.733s][info][gc,phases,start] GC(6211) Marking Phase [648.742s][info][gc,phases ] GC(6211) Marking Phase 8.585ms [648.742s][info][gc,phases,start] GC(6211) Summary Phase [648.742s][info][gc,phases ] GC(6211) Summary Phase 0.009ms [648.742s][info][gc,phases,start] GC(6211) Adjust Roots [648.742s][info][gc,phases ] GC(6211) Adjust Roots 0.311ms [648.742s][info][gc,phases,start] GC(6211) Compaction Phase [648.747s][info][gc,phases ] GC(6211) Compaction Phase 4.701ms [648.747s][info][gc,phases,start] GC(6211) Post Compact [648.747s][info][codecache ] >>> CodeCache::update_cold_gc_count _unloading_threshold_gc_requested = false [648.747s][info][codecache ] Code cache critically low; use aggressive aging [648.747s][info][gc,phases ] GC(6211) Post Compact 0.106ms [648.747s][info][gc,heap ] GC(6211) PSYoungGen: 224K(132096K)->0K(132096K) Eden: 0K(131584K)->0K(131584K) From: 224K(512K)->0K(512K) [648.747s][info][gc,heap ] GC(6211) ParOldGen: 11691K(349696K)->11688K(349696K) [648.747s][info][gc,metaspace ] GC(6211) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.747s][info][gc ] GC(6211) Pause Full (CodeCache GC Aggressive) 11M->11M(470M) 13.799ms [648.747s][info][gc,cpu ] GC(6211) User=0.11s Sys=0.00s Real=0.01s [648.747s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 0) [648.747s][info][codecache ] Triggering aggressive GC due to having only 24.865% free memory [648.747s][info][gc,start ] GC(6212) Pause Young (CodeCache GC Aggressive) [648.748s][info][gc,heap ] GC(6212) PSYoungGen: 2851K(132096K)->224K(132096K) Eden: 2851K(131584K)->0K(131584K) From: 0K(512K)->224K(512K) [648.748s][info][gc,heap ] GC(6212) ParOldGen: 11688K(349696K)->11688K(349696K) [648.748s][info][gc,metaspace ] GC(6212) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.748s][info][gc ] GC(6212) Pause Young (CodeCache GC Aggressive) 14M->11M(470M) 0.257ms [648.748s][info][gc,cpu ] GC(6212) User=0.00s Sys=0.00s Real=0.00s [648.748s][info][gc,start ] GC(6213) Pause Full (CodeCache GC Aggressive) [648.748s][info][gc,phases,start] GC(6213) Marking Phase [648.756s][info][gc,phases ] GC(6213) Marking Phase 8.512ms [648.756s][info][gc,phases,start] GC(6213) Summary Phase [648.756s][info][gc,phases ] GC(6213) Summary Phase 0.007ms [648.756s][info][gc,phases,start] GC(6213) Adjust Roots [648.757s][info][gc,phases ] GC(6213) Adjust Roots 0.331ms [648.757s][info][gc,phases,start] GC(6213) Compaction Phase [648.761s][info][gc,phases ] GC(6213) Compaction Phase 4.734ms [648.761s][info][gc,phases,start] GC(6213) Post Compact [648.761s][info][codecache ] >>> CodeCache::update_cold_gc_count _unloading_threshold_gc_requested = false [648.761s][info][codecache ] Code cache critically low; use aggressive aging [648.761s][info][gc,phases ] GC(6213) Post Compact 0.059ms [648.761s][info][gc,heap ] GC(6213) PSYoungGen: 224K(132096K)->0K(132096K) Eden: 0K(131584K)->0K(131584K) From: 224K(512K)->0K(512K) [648.761s][info][gc,heap ] GC(6213) ParOldGen: 11688K(349696K)->11689K(349696K) [648.761s][info][gc,metaspace ] GC(6213) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.761s][info][gc ] GC(6213) Pause Full (CodeCache GC Aggressive) 11M->11M(470M) 13.725ms [648.761s][info][gc,cpu ] GC(6213) User=0.09s Sys=0.02s Real=0.01s [648.762s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 0) [648.762s][info][codecache ] Triggering aggressive GC due to having only 24.895% free memory [648.762s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 1) [648.762s][info][gc,start ] GC(6214) Pause Young (GCLocker Initiated GC) [648.762s][info][gc,heap ] GC(6214) PSYoungGen: 1973K(132096K)->224K(132096K) Eden: 1973K(131584K)->0K(131584K) From: 0K(512K)->224K(512K) [648.762s][info][gc,heap ] GC(6214) ParOldGen: 11689K(349696K)->11689K(349696K) [648.762s][info][gc,metaspace ] GC(6214) Metaspace: 7691K(8192K)->7691K(8192K) NonClass: 6905K(7168K)->6905K(7168K) Class: 786K(1024K)->786K(1024K) [648.762s][info][gc ] GC(6214) Pause Young (GCLocker Initiated GC) 13M->11M(470M) 0.310ms [648.762s][info][gc,cpu ] GC(6214) User=0.00s Sys=0.00s Real=0.00s [648.762s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 1) ** removed 278 occurrences of the same log ** [672.714s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 1) [672.714s][info][codecache ] Code cache is full - disabling compilation [672.714s][warning][codecache ] CodeCache is full. Compiler has been disabled. [672.714s][warning][codecache ] Try increasing the code cache size using -XX:ReservedCodeCacheSize= OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= CodeCache: size=2496Kb used=2479Kb max_used=2479Kb free=16Kb bounds [0x00007923ed490000, 0x00007923ed700000, 0x00007923ed700000] total_blobs=1127 nmethods=640 adapters=399 compilation: disabled (not enough contiguous free space left) stopped_count=1, restarted_count=0 full_count=1 Both JVM were started for ~20 minutes ### jconsole (reproducting the bug) ![image](https://github.com/user-attachments/assets/61db5c41-5ce8-4aad-ba98-dbbef142f420) Started to misbehave at ~315.181s ### jconsole (with the fix from the PR) ![image](https://github.com/user-attachments/assets/f21dd6fc-7b7d-4b7c-a57a-86e60b2577ce) [13.078s][debug][codecache ] Previous GC request has not been reset after 13.018797s, force auto-reset [412.985s][debug][codecache ] Previous GC request has not been reset after 23.985252s, force auto-reset [464.974s][debug][codecache ] Previous GC request has not been reset after 7.970082s, force auto-reset [524.953s][debug][codecache ] Previous GC request has not been reset after 3.937477s, force auto-reset Converted to draft: I would like to change it to ensure we log before calling `Universe::heap()->collect(...)` (same way as before) Performing more tests on this (different configuration, different GC, ...), I noticed that I had a race condition when multiple threads enter the `try_to_gc` method I introduced. The race condition impact was : - an unwanted auto-reset of the flag - an invalid "duration since last GC request" log - an unneeded GC request Possible in the following conditions: - thread1: reads `_unloading_gc_requested_time` (with `elapsed_since_last_gc_request` > 250ms) - thread2: has `_unloading_gc_requested == false` ? it requests GC - thread1: has `_unloading_gc_requested == true` ? it resets `_unloading_gc_requested` + log + request GC ? In order to avoid that I propose to simply ensure we don't have multiple threads performing the checks in `gc_on_allocation` ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2661567656 PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2661578204 PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2662439874 PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2664920007 From ihse at openjdk.org Mon Apr 28 20:12:51 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 28 Apr 2025 20:12:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v9] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 22:22:02 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into JDK-8350209 > - Downgraded UL as asked. Added synchronization to C strings caching. > - Fix message > - Generate far jumps for AOT code on AArch64 > - remove _enabled suffix > - Add sanity test for AOTAdapterCaching flag > - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. > - Removed unused AOTCodeSection class > - 8350209: Preserve adapters in AOT cache Build change is trivially fine. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2800601222 From gcao at openjdk.org Tue Apr 29 01:27:01 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 01:27:01 GMT Subject: RFR: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV [v5] In-Reply-To: References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: On Mon, 28 Apr 2025 08:02:34 GMT, Gui Cao wrote: >> Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix for RealYang comment Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24910#issuecomment-2837174911 From gcao at openjdk.org Tue Apr 29 01:27:01 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 01:27:01 GMT Subject: Integrated: 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV In-Reply-To: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> References: <46jT8ol1f1Vd2Yfqq__tfAqT85RWMtLkFxxqVucgKQA=.392bade5-0d18-4a76-921d-b8e8463a55e8@github.com> Message-ID: On Mon, 28 Apr 2025 03:20:48 GMT, Gui Cao wrote: > Hi, After https://github.com/openjdk/jdk/pull/24129, some tests run failed. due to the fact that the later predicate condition overrides the previous one. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: ea3cf1b8 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/ea3cf1b882c89bfe96af3aa389b69b842d72159c Stats: 24 lines in 1 file changed: 0 ins; 4 del; 20 mod 8355668: RISC-V: jdk/incubator/vector/Int256VectorTests.java fails when using RVV Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/24910 From duke at openjdk.org Tue Apr 29 01:35:49 2025 From: duke at openjdk.org (erifan) Date: Tue, 29 Apr 2025 01:35:49 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 14:10:40 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: >> >>> 235: public static void testCompareEQMaskNotByte() { >>> 236: testCompareMaskNotByte(VectorOperators.EQ); >>> 237: } >> >> Another comment: you now only have "negative" tests, where you check for count `=0`. It would be good if you also had a positive rule here, one where you do see an XOR in a similar case, where your optimization does apply. >> >> This would basically be a "control" that checks that your are testing the right thing. > > Also: this test should vectorize on some plarforms, right? A compare, correct? Would it not be good to actively check that with an IR rule? > Another comment: you now only have "negative" tests, where you check for count =0. It would be good if you also had a positive rule here, one where you do see an XOR in a similar case, where your optimization does apply. This would basically be a "control" that checks that your are testing the right thing. This is not a negative test, this is a positive test. What this patch does is: `Vector compare(NE) + not() => vector compare(EQ)`. The `not()` operation is removed. For details, please see the commit message and related code. So here we check XorV and XorVMask == 0, which I think is reasonable. For negative tests, I think it's not necessary, because I only test what I optimized, and I won't write a test to say what optimization cannot be done. > Also: this test should vectorize on some plarforms, right? A compare, correct? Would it not be good to actively check that with an IR rule? The `compare` operation is not eliminated, the patch aims to eliminate the `not` operation following `compare`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2065195738 From gcao at openjdk.org Tue Apr 29 01:37:11 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 01:37:11 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v5] In-Reply-To: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: > Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8355654 - Polishing - Code format - Merge branch 'master' into JDK-8355654 - 8355654: RISC-V: Relax register constraint for some vector-scalar instructions ------------- Changes: https://git.openjdk.org/jdk/pull/24905/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24905&range=04 Stats: 115 lines in 1 file changed: 0 ins; 0 del; 115 mod Patch: https://git.openjdk.org/jdk/pull/24905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24905/head:pull/24905 PR: https://git.openjdk.org/jdk/pull/24905 From dlong at openjdk.org Tue Apr 29 01:38:00 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Apr 2025 01:38:00 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Fri, 25 Apr 2025 12:03:41 GMT, Doug Simon wrote: >> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - swap matadata and jvmci data in outputs according to data layout >> - cleanup >> - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup >> - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description >> - add a separate adrp_movk function to to support targets located more than 4GB away >> - Force the use of movk in combination with adrp and ldr instructions to address scenarios >> where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp >> - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: >> _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. >> Fix: use _oops_size int16 field to calculate metadata offset >> - removing dead code >> - a bit of cleanup and addressing review suggestions >> - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c > > src/hotspot/share/code/nmethod.cpp line 1505: > >> 1503: CHECKED_CAST(_oops_size, uint16_t, align_up(code_buffer->total_oop_size(), oopSize)); >> 1504: uint16_t metadata_size = (uint16_t)align_up(code_buffer->total_metadata_size(), wordSize); >> 1505: JVMCI_ONLY(CHECKED_CAST(_jvmci_data_size, uint16_t, align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize))); > > This cast is lossy in that `jvmci_data->size()` returns an int. It caused a `double free or corruption (out)` crash in Graal in the case where a `JVMCINMethodData` had a very long name. We've fixed this by [limiting the length of the name](https://github.com/openjdk/jdk/pull/24753) but I'm wondering if there was some special reason for this cast? If so, can you please add extra logic preventing this code from running off the end of allocated memory: > > #if INCLUDE_JVMCI > if (compiler->is_jvmci()) { > // Initialize the JVMCINMethodData object inlined into nm > jvmci_nmethod_data()->copy(jvmci_data); > } > #endif > > If not, please remove the cast. The cast was added by 8331087, which reduced the supported JVMCI data size to uint16_t. I don't remember this issue with long names coming up during that review, so I guess we all missed it. @dougxc please file a bug so we can track this. It seems like JVMCINMethodData::copy should do something like truncate long names rather than blindly assuming it has enough space. If uint16_t is unreasonably small for JVMCI nmethod data we could revert that change and make it 32 bits again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r2065196967 From fyang at openjdk.org Tue Apr 29 01:51:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 01:51:58 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v5] In-Reply-To: References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: <3PlqTC6UatDpHIoC7y-v22ZR1AxreIYGgpglzwhAFSI=.ebcb5aa1-1142-44aa-9d83-7d44f76616ba@github.com> On Tue, 29 Apr 2025 01:37:11 GMT, Gui Cao wrote: >> Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8355654 > - Polishing > - Code format > - Merge branch 'master' into JDK-8355654 > - 8355654: RISC-V: Relax register constraint for some vector-scalar instructions Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24905#pullrequestreview-2801586436 From duke at openjdk.org Tue Apr 29 01:55:46 2025 From: duke at openjdk.org (erifan) Date: Tue, 29 Apr 2025 01:55:46 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 14:12:02 GMT, Emanuel Peter wrote: > I suppose in that case you can assert that you NEVER get those nodes, because if you have vectors not supported, they will not show up because of that, and if you do support vectors, they should be optimized away. This is expected. - If vectors are supported, the test checks that Vector not() is optimized away. - If vectors are not supported, the vector IRs won't generated, so the IR check will pass. And the correctness verification also runs, this is used to ensure that the patch does not break correctness. In the future if vectors are supported, the test also runs without any modification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2065211944 From gcao at openjdk.org Tue Apr 29 02:11:50 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 02:11:50 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v5] In-Reply-To: References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: On Tue, 29 Apr 2025 01:37:11 GMT, Gui Cao wrote: >> Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8355654 > - Polishing > - Code format > - Merge branch 'master' into JDK-8355654 > - 8355654: RISC-V: Relax register constraint for some vector-scalar instructions Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24905#issuecomment-2837239261 From duke at openjdk.org Tue Apr 29 02:11:50 2025 From: duke at openjdk.org (duke) Date: Tue, 29 Apr 2025 02:11:50 GMT Subject: RFR: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions [v5] In-Reply-To: References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: On Tue, 29 Apr 2025 01:37:11 GMT, Gui Cao wrote: >> Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. >> >> ### Testing >> qemu-system 9.1.50 with UseRVV: >> >> - [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8355654 > - Polishing > - Code format > - Merge branch 'master' into JDK-8355654 > - 8355654: RISC-V: Relax register constraint for some vector-scalar instructions @zifeihan Your change (at version fedb60efb5e9c228cab1dfdf07c4410076033396) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24905#issuecomment-2837241323 From gcao at openjdk.org Tue Apr 29 02:14:50 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 02:14:50 GMT Subject: Integrated: 8355654: RISC-V: Relax register constraint for some vector-scalar instructions In-Reply-To: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> References: <-8ukYnvnLSOYQpSFEobuIWZtjHxutLqs4EJ7m2u2zQQ=.27d00e4e-c999-429e-84d9-fd87708d90cc@github.com> Message-ID: On Sun, 27 Apr 2025 14:28:53 GMT, Gui Cao wrote: > Hi, here we separate src from dst_src to give the compiler a greater degree of flexibility. Note that in the masked version we can't modify it, because we can't change inactive elements in the mask version. > > ### Testing > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: 7bde2bb5 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/7bde2bb57159aaac36a6a585f70c4672919c8c16 Stats: 115 lines in 1 file changed: 0 ins; 0 del; 115 mod 8355654: RISC-V: Relax register constraint for some vector-scalar instructions Reviewed-by: fyang, dzhang ------------- PR: https://git.openjdk.org/jdk/pull/24905 From duke at openjdk.org Tue Apr 29 02:45:46 2025 From: duke at openjdk.org (erifan) Date: Tue, 29 Apr 2025 02:45:46 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: <41Y44tSReVYSBYslzzcs-GGyQ6SPM2CfTkJ3-MpGBYM=.cbb47073-5d30-4391-9d67-b2fa758f4dca@github.com> References: <41Y44tSReVYSBYslzzcs-GGyQ6SPM2CfTkJ3-MpGBYM=.cbb47073-5d30-4391-9d67-b2fa758f4dca@github.com> Message-ID: <5OOJ1Arw03ZOr1T6BhfQSTNL2SShAuJGxhXep1HweGU=.f9a59f0d-f9d0-4e91-b37b-9460089accbf@github.com> On Mon, 28 Apr 2025 14:13:43 GMT, Emanuel Peter wrote: > > > > > Just a drive-by comment for now, I may review this later more fully. > > > > > > I would also prefer if you added the IR restrictions rather than the JTREG requires. > > > > > > The benefit is that we can still run the tests on all platforms, at least for result verification. > > > > > > Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug. > > > > > > > > > > > > > > > Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either. > > > > > > > > > > > > Thanks! The problem is that when a new platform is added, people may not even know there is a test. > > > > > > > > > @erifan That is true. But we have that problem either way. If you use `@require`, then the person does not realize there is a test AND the test is not run. If you use `applyIf`, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. > > > > > > This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms. > > I see. You should probably add a comment there, to say that you are only excluding `AVX=0`. But even `UseAVX = 0` would profit from result verification. @requires is a special comment itself. I feel like it's a bit weird to add a comment to a comment, and I don't think the @requires is hard to understand. If we want to verify the correctness of AVX=0, we have to use ApplyIf. This is back to the beginning of the question, should we use @requires or ApplyIf? Personally I tend to use the former. By the way, I have tested the correctness of AVX=0 locally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2837292655 From duke at openjdk.org Tue Apr 29 02:53:32 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 29 Apr 2025 02:53:32 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 Message-ID: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! ------------- Commit messages: - RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 Changes: https://git.openjdk.org/jdk/pull/24918/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24918&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355796 Stats: 135 lines in 1 file changed: 104 ins; 12 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24918.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24918/head:pull/24918 PR: https://git.openjdk.org/jdk/pull/24918 From fyang at openjdk.org Tue Apr 29 02:58:44 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 02:58:44 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 In-Reply-To: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: On Mon, 28 Apr 2025 11:17:19 GMT, Anjian-Wen wrote: > One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! src/hotspot/cpu/riscv/riscv_v.ad line 1289: > 1287: predicate(UseZvbb && Matcher::vector_element_basic_type(n) == T_BYTE); > 1288: match(Set dst_src1 (AndV (Binary dst_src1 (Replicate (XorI src2 m1))) v0)); > 1289: format %{ "vand_not_vx_masked $dst_src1, $dst_src1, $src2, $v0" %} You might want to update `vand_not_vx_masked` to `vand_notB_vx_masked`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24918#discussion_r2065317393 From duke at openjdk.org Tue Apr 29 03:06:35 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 29 Apr 2025 03:06:35 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 [v2] In-Reply-To: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: > One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: fix format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24918/files - new: https://git.openjdk.org/jdk/pull/24918/files/b9ee6002..66e24b67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24918&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24918&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24918.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24918/head:pull/24918 PR: https://git.openjdk.org/jdk/pull/24918 From duke at openjdk.org Tue Apr 29 03:06:36 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 29 Apr 2025 03:06:36 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 [v2] In-Reply-To: References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: On Tue, 29 Apr 2025 02:55:27 GMT, Fei Yang wrote: >> Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix format > > src/hotspot/cpu/riscv/riscv_v.ad line 1289: > >> 1287: predicate(UseZvbb && Matcher::vector_element_basic_type(n) == T_BYTE); >> 1288: match(Set dst_src1 (AndV (Binary dst_src1 (Replicate (XorI src2 m1))) v0)); >> 1289: format %{ "vand_not_vx_masked $dst_src1, $dst_src1, $src2, $v0" %} > > You might want to update `vand_not_vx_masked` to `vand_notB_vx_masked`. thanks for pointing out. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24918#discussion_r2065324938 From fyang at openjdk.org Tue Apr 29 03:48:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 03:48:48 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 [v2] In-Reply-To: References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: On Tue, 29 Apr 2025 03:06:35 GMT, Anjian-Wen wrote: >> One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24918#pullrequestreview-2801845822 From amitkumar at openjdk.org Tue Apr 29 03:50:48 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Apr 2025 03:50:48 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v2] In-Reply-To: <3IOfTTpkYSmmQcYRagn4f5uLP4wU8ArZRsGmpHdnV2I=.d9c57e7b-5275-435a-abdb-2a7a28734fec@github.com> References: <3IOfTTpkYSmmQcYRagn4f5uLP4wU8ArZRsGmpHdnV2I=.d9c57e7b-5275-435a-abdb-2a7a28734fec@github.com> Message-ID: On Wed, 9 Apr 2025 21:28:26 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: >> >> - reviews for Martin >> - Revert "minor improvement" >> >> This reverts commit a6af6da26d1e0590dc24486131d1bc752e047f98. >> - minor improvement >> - reviews from Lutz and Martin > > This looks good to me. I suggest measuring performance with the latest version. Hi @TheRealMDoerr, can I get second approval. we are trying to see if we can squeeze out more optimization and If I see good results then we can raise another patch. But I think this is also adding good performance boost, so should be fine to integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2837369707 From galder at openjdk.org Tue Apr 29 05:04:46 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 29 Apr 2025 05:04:46 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 15:28:52 GMT, Daniel Lund?n wrote: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Thanks @dlunde for the PR. Some small comments. src/hotspot/share/opto/gcm.cpp line 889: > 887: // since the load will be forced into a block preceding the Phi. > 888: pred_block->set_raise_LCA_mark(load_index); > 889: assert(!LCA_orig->dominates(pred_block) || Has this assert moved elsewhere? Or do we really want to remove it altogether? src/hotspot/share/opto/gcm.cpp line 912: > 910: // they CAN write to Java memory. > 911: if (muse->ideal_Opcode() == Op_CallStaticJava) { > 912: assert(muse->is_MachSafePoint(), ""); I know there was not assert message before, but can we use the opportunity to add a meaningful message for this assert? There's another empty message assert a few lines before. test/hotspot/jtreg/compiler/loopopts/TestSplitIfPinnedLoadInStripMinedLoop.java line 141: > 139: > 140: // Same as test2 but with reference to inner loop induction variable 'j' and different order of instructions. > 141: // Triggers an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification is disabled: Is this test still valid? According to the comment it should trigger an assert but this assert appears to be removed? Is the test correct if the test is passing even though the assert has been removed? See my above comment on the removal of this assertion. ------------- Changes requested by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2801983806 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2065453924 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2065450925 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2065456016 From kvn at openjdk.org Tue Apr 29 06:27:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 06:27:41 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v10] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Fix C strings caching - Merge branch 'master' into JDK-8350209 - Merge branch 'master' into JDK-8350209 - Downgraded UL as asked. Added synchronization to C strings caching. - Fix message - Generate far jumps for AOT code on AArch64 - remove _enabled suffix - Add sanity test for AOTAdapterCaching flag - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. - Removed unused AOTCodeSection class - ... and 1 more: https://git.openjdk.org/jdk/compare/7cf190fb...1b0c89f6 ------------- Changes: https://git.openjdk.org/jdk/pull/24740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=09 Stats: 3314 lines in 51 files changed: 2845 ins; 200 del; 269 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From kvn at openjdk.org Tue Apr 29 06:27:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 06:27:42 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v9] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 20:10:19 GMT, Magnus Ihse Bursie wrote: >> Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Merge branch 'master' into JDK-8350209 >> - Downgraded UL as asked. Added synchronization to C strings caching. >> - Fix message >> - Generate far jumps for AOT code on AArch64 >> - remove _enabled suffix >> - Add sanity test for AOTAdapterCaching flag >> - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. >> - Removed unused AOTCodeSection class >> - 8350209: Preserve adapters in AOT cache > > Build change is trivially fine. Thank you, @magicus, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2837612510 From kvn at openjdk.org Tue Apr 29 06:27:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 06:27:43 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v9] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 22:22:02 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into JDK-8350209 > - Downgraded UL as asked. Added synchronization to C strings caching. > - Fix message > - Generate far jumps for AOT code on AArch64 > - remove _enabled suffix > - Add sanity test for AOTAdapterCaching flag > - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. > - Removed unused AOTCodeSection class > - 8350209: Preserve adapters in AOT cache Did new merge from mainline and fixed C strings caching issue found in `leyden/premain` branch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2837614987 From gcao at openjdk.org Tue Apr 29 07:34:47 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 07:34:47 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 [v2] In-Reply-To: References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: On Tue, 29 Apr 2025 03:06:35 GMT, Anjian-Wen wrote: >> One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format Looks good to me. ------------- Marked as reviewed by gcao (Author). PR Review: https://git.openjdk.org/jdk/pull/24918#pullrequestreview-2802358184 From mchevalier at openjdk.org Tue Apr 29 07:46:55 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 29 Apr 2025 07:46:55 GMT Subject: Integrated: 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 06:55:54 GMT, Marc Chevalier wrote: > We have a UB when the shift is equal to or bigger than the number of bits in > the type. Our expression is > > (julong)CONST64(1) << (julong)(BitsPerJavaLong - shift) > > so we have a UB when the RHS is `>= 64`, that is when `shift` is `<= 0`. Since shift is masked to > be in `[0, BitPerJavaLong - 1]`, we actually have a UB when `shift == 0`. The > code doesn't forbid it, indeed, and it doesn't seem to be enforced by more global > invariants. > > This UB doesn't reproduce anymore with the provided cases. > I've replaced the UB with an explicit assert to try to find another failing > case. No hit when run with tier1, tier2, tier3, hs-precheckin-comp and hs-comp-stress. > > Nevertheless, the assert indeed hit on the master of when the issue was filed. > More precisely, I've bisect for the two tests `java/foreign/StdLibTest.java` > and `java/lang/invoke/PermuteArgsTest.java` and the assert hits until > [8339196: Optimize BufWriterImpl#writeU1/U2/Int/Long](https://bugs.openjdk.org/browse/JDK-8339196). > > It is not clear to me why the issue stopped reproducing after this commit, but given > the lack of reproducer, I went with a semi-blind fix: it fixes the issue back then, > and still removes a chance of UB. It simply makes sure the RHS of this shift cannot be > 64 by making sure `shift` cannot be 0. > > If `shift` is indeed 0, since it is the RHS of a `RShiftLNode`, `RShiftLNode::Identity` > should simply returns the LHS of the shift, and entirely eliminate the RShiftLNode. > > The implementation of `AndINode::Ideal` is, on the other hand, safe. Indeed, it uses > `right_n_bits(BitsPerJavaInteger - shift)` instead of doing manually > `(1 << (BitsPerJavaInteger - shift)) - 1`. This macro is safe as it would return `-1` if > the shift is too large rather than causing a UB. Yet, I didn't use this way since it would > cause the replacement of `AndI(X, RShiftI(Y, 0))` by `AndI(X, URShiftI(Y, 0))` before > simplifying the `URShiftI` into `Y`. In between, it also implies that all users of the > And node will be enqueued for IGVN for a not-very-interesting change. Simply skipping > the replacement of RShiftL into URShiftL allows to directly come to `AndL(X, Y)` without > useless steps. > > Thanks, > Marc This pull request has now been integrated. Changeset: 108078a6 Author: Marc Chevalier Committer: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/108078a6813f49fa82b6f97a8a6665d200d95e28 Stats: 9 lines in 1 file changed: 2 ins; 0 del; 7 mod 8338194: ubsan: mulnode.cpp:862:59: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' Reviewed-by: dlong, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/24841 From gcao at openjdk.org Tue Apr 29 07:49:21 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 29 Apr 2025 07:49:21 GMT Subject: RFR: 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV Message-ID: Hi, when I use the qemu-system mode, the jdk/incubator/vector/DoubleMaxVectorTests.java test run fails. As discussed on JBS, As discussed on jbs, the SIGILL instruction is vmv1r.v v1,v3. I found the cause of the problem on the qemu source code, qemu Add vill check for whole vector register move instructions [1]. The ratified version of RISC-V V spec section 16.6 says that `The instructions operate as if EEW=SEW`. So the whole vector register move instructions depend on the vtype We need to add vsetvli before move vector register. In order not to have a misunderstanding here, maybe Vector Whole Vector Register Move without setting vsetvli, I replaced vmv1r_v with vmv_v_v. [1] https://patchwork.kernel.org/project/qemu-devel/patch/20231129170400.21251-2-max.chou at sifive.com/#25621160 ### Testing qemu-system 9.1.50 with UseRVV: - [x] Run test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV Changes: https://git.openjdk.org/jdk/pull/24943/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24943&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355878 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24943.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24943/head:pull/24943 PR: https://git.openjdk.org/jdk/pull/24943 From chagedorn at openjdk.org Tue Apr 29 07:53:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Apr 2025 07:53:47 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc Looks good to me and I agree that the proposed tests should always have been in tier1 for the laid out reasons above. > It seems not to increase by more than the existing fluctuations. Thanks for double-checking! > `predicates/TestCloningWithManyDiamondsInExpression.java` with a timeout of 30 A lower than standard timeout is always fine. I originally added this test which was stuck in an endless loop before the fix. I put a lower than standard timeout there because I wanted to make sure it's now finishing quickly. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24817#pullrequestreview-2802403211 From mchevalier at openjdk.org Tue Apr 29 07:56:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 29 Apr 2025 07:56:45 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc I couldn't find where the standard timeout is specified. How much is it? And maybe, where I could have found it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2837846115 From chagedorn at openjdk.org Tue Apr 29 08:17:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Apr 2025 08:17:46 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc At some point I've run into the default timeout by specifying flags that are too slow. It is set to 120s. I quickly checked the jtreg FAQ. It can also be found there ([link](https://openjdk.org/jtreg/faq.html#what-do-i-need-to-know-about-test-timeouts)): [...] The default time is 120 seconds (2 minutes) but can be changed using the /timeout option for the action. [...] ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2837893421 From fyang at openjdk.org Tue Apr 29 08:24:45 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 08:24:45 GMT Subject: RFR: 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV In-Reply-To: References: Message-ID: <8T8T9hqhQc3q2i6jWndrLsNcOOLDVqMbMP4Glvan2Ys=.6291b5f6-d012-4940-a22d-d7be86899b96@github.com> On Tue, 29 Apr 2025 07:17:01 GMT, Gui Cao wrote: > Hi, when I use the qemu-system mode, the jdk/incubator/vector/DoubleMaxVectorTests.java test run fails. > > As discussed on JBS, As discussed on jbs, the SIGILL instruction is vmv1r.v v1,v3. > > I found the cause of the problem on the qemu source code, qemu Add vill check for whole vector register move instructions [1]. > > The ratified version of RISC-V V spec section 16.6 says that `The instructions operate as if EEW=SEW`. > > So the whole vector register move instructions depend on the vtype > > We need to add vsetvli before move vector register. In order not to have a misunderstanding here, maybe Vector Whole Vector Register Move without setting vsetvli, I replaced vmv1r_v with vmv_v_v. > > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20231129170400.21251-2-max.chou at sifive.com/#25621160 > > ### Testing > > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Looks reasonable. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24943#pullrequestreview-2802481984 From shade at openjdk.org Tue Apr 29 08:48:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 08:48:20 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording Message-ID: During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). Additional testing: - [x] Ad-hoc benchmarks - [x] Linux x86_64 server fastdebug, `all` - [x] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - More touchups - More touchups - Touchup comments - Fix Changes: https://git.openjdk.org/jdk/pull/24933/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24933&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355769 Stats: 28 lines in 1 file changed: 23 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24933.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24933/head:pull/24933 PR: https://git.openjdk.org/jdk/pull/24933 From shade at openjdk.org Tue Apr 29 08:48:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 08:48:20 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 18:01:42 GMT, Aleksey Shipilev wrote: > During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. > > This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. > > Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` A crude way to show impact on this in mainline: compile lots of simple methods. $ taskset -c 0-7 hyperfine -w 3 -r 10 \ "build/linux-x86_64-server-release/images/jdk/bin/java -Xcomp -XX:TieredStopAtLevel=1 -XX:-Inline Hello.java" # Before Time (mean ? ?): 1.510 s ? 0.007 s [User: 1.386 s, System: 0.138 s] Range (min ? max): 1.501 s ? 1.526 s 10 runs # After Time (mean ? ?): 1.504 s ? 0.007 s [User: 1.378 s, System: 0.140 s] Range (min ? max): 1.494 s ? 1.516 s 10 runs On Leyden, and well-trained javac runs, the impact is good: $ hyperfine -w 10 -r 30 "build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:AOTCache=app.aot JavacBenchApp 50" # Before Time (mean ? ?): 340.0 ms ? 4.4 ms [User: 678.1 ms, System: 121.0 ms] Range (min ? max): 329.8 ms ? 351.0 ms 30 runs # After Time (mean ? ?): 331.2 ms ? 4.0 ms [User: 659.6 ms, System: 119.4 ms] Range (min ? max): 324.3 ms ? 338.7 ms 30 runs ------------- PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2836097346 PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2836099693 From mdoerr at openjdk.org Tue Apr 29 09:05:49 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 29 Apr 2025 09:05:49 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation How is the behavior of mvc specified when hitting a signal (SIGSEGV or SIGBUS)? Will all Bytes before that place be written? I guess that is the reason why the C++ code uses the atomic version. Did you check which instructions are generated by the C++ compiler? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2838014992 From duke at openjdk.org Tue Apr 29 09:18:49 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 29 Apr 2025 09:18:49 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: <_xYfxZFxmWuwN7hehKE40cyl5VmUS-vU-f5rCfEj-KU=.dd2a0b7b-97ee-4670-91ac-3e115e13b727@github.com> On Mon, 28 Apr 2025 12:15:00 GMT, Manuel H?ssig wrote: >> This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. >> >> Testing: >> - [x] tier1 through tier4 on x86 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply readability suggestion > > Co-authored-by: Aleksey Shipil?v Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24911#issuecomment-2838049647 From duke at openjdk.org Tue Apr 29 09:18:50 2025 From: duke at openjdk.org (duke) Date: Tue, 29 Apr 2025 09:18:50 GMT Subject: RFR: 8355472: Clean up x86 nativeInst after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:15:00 GMT, Manuel H?ssig wrote: >> This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. >> >> Testing: >> - [x] tier1 through tier4 on x86 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply readability suggestion > > Co-authored-by: Aleksey Shipil?v @mhaessig Your change (at version f2382dce7ccc30fc32138d5362ad92c6e9c51407) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24911#issuecomment-2838052424 From dzhang at openjdk.org Tue Apr 29 09:19:52 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 29 Apr 2025 09:19:52 GMT Subject: RFR: 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 07:17:01 GMT, Gui Cao wrote: > Hi, when I use the qemu-system mode, the jdk/incubator/vector/DoubleMaxVectorTests.java test run fails. > > As discussed on JBS, As discussed on jbs, the SIGILL instruction is vmv1r.v v1,v3. > > I found the cause of the problem on the qemu source code, qemu Add vill check for whole vector register move instructions [1]. > > The ratified version of RISC-V V spec section 16.6 says that `The instructions operate as if EEW=SEW`. > > So the whole vector register move instructions depend on the vtype > > We need to add vsetvli before move vector register. In order not to have a misunderstanding here, maybe Vector Whole Vector Register Move without setting vsetvli, I replaced vmv1r_v with vmv_v_v. > > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20231129170400.21251-2-max.chou at sifive.com/#25621160 > > ### Testing > > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) LGTM, thanks! ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/24943#pullrequestreview-2802678335 From shade at openjdk.org Tue Apr 29 09:22:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 09:22:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Improve get_method_blocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/eaf3f14d..9f44cb5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=08-09 Stats: 12 lines in 1 file changed: 4 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Tue Apr 29 09:22:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 09:22:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Mon, 28 Apr 2025 18:48:47 GMT, Vladimir Ivanov wrote: >>> I agree is_permanent_class_loader_data() captures the intent better. Let me see if it fits well here. >> >> Ah wait, it does not. We need to hold on to something that blocks the unloading. Just checking `is_permanent_class_loader_data()` does not get us there. We would need to ask for some holder for it. For the reasons above, `method->method_holder()->klass_holder()` works for non-strong hidden classes as well. >> >> This is also why current mainline code works -- it captures the same thing. > > Ok, thanks for checking! Good to know there's no existing bug. > > What I had in mind is as follows: > > InstanceKlass* holder = method->method_holder(); > if (holder->class_loader_data()->is_permanent_class_loader_data()) { > return nullptr; // method holder class can't be unloaded > } else { > // Normal class, return the holder that would block unloading. > // This would be either classloader oop for non-hidden classes, > // or Java mirror oop for hidden classes. > assert(holder->klass_holder() != nullptr, ""); > return holder->klass_holder(); > } > > > IMO it makes the check more precise and, at the same time, communicates the intent better. What do you think? Yes, OK, let's do a variant of that. Committed. I'll re-run test to see if there are any surprises about these asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2065897750 From duke at openjdk.org Tue Apr 29 09:24:52 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 29 Apr 2025 09:24:52 GMT Subject: Integrated: 8355472: Clean up x86 nativeInst after 32-bit x86 removal In-Reply-To: References: Message-ID: <8s4u_UjQ1uK_Ri4RfMAaTOqPVy9nxSFFi30nNhXAoFc=.50394bd9-94de-4ff2-b20f-f092520e1e52@github.com> On Mon, 28 Apr 2025 08:23:25 GMT, Manuel H?ssig wrote: > This PR cleans up code 32-bit x86 code in `nativeInst_x86.*` files. > > Testing: > - [x] tier1 through tier4 on x86 plus Oracle internal testing This pull request has now been integrated. Changeset: 6a0c24f9 Author: Manuel H?ssig Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/6a0c24f9db0b15a00ecadca6e853ed5aa3775b78 Stats: 82 lines in 2 files changed: 0 ins; 81 del; 1 mod 8355472: Clean up x86 nativeInst after 32-bit x86 removal Reviewed-by: shade, rcastanedalo, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/24911 From epeter at openjdk.org Tue Apr 29 09:31:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 09:31:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 14:20:57 GMT, Mohamed Issa wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Create separate tanh micro-benchmark module to avoid noise in MathBench > > @TobiHartmann @vnkozlov Ok to run this through Oracle test framework before integration? @missa-prime I launched some internal testing :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2838088852 From adinn at openjdk.org Tue Apr 29 09:39:46 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 29 Apr 2025 09:39:46 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 18:01:42 GMT, Aleksey Shipilev wrote: > During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. > > This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. > > Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` I checked the two callers of this method and noticed that `MethodHandles::add_dependent_nmethod` includes `assert_locked_or_safepoint(CodeCache_lock)` while `InstanceKlass::add_dependent_nmethod` has no assert. First, should both not include an assert? Second, why allow for being at a safepoint in the case of a MethodHandle rather than just assert the lock is held? ------------- PR Review: https://git.openjdk.org/jdk/pull/24933#pullrequestreview-2802741153 From duke at openjdk.org Tue Apr 29 09:47:51 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 29 Apr 2025 09:47:51 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <-erDwDMbT780ArCHyfj3LTBs1FbMw-rr_EBKRUbhRz8=.5a30cc12-2091-4001-a650-446b92be1569@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> <-erDwDMbT780ArCHyfj3LTBs1FbMw-rr_EBKRUbhRz8=.5a30cc12-2091-4001-a650-446b92be1569@github.com> Message-ID: On Mon, 28 Apr 2025 15:43:11 GMT, Galder Zamarre?o wrote: >> ## Issue Summary >> >> The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. >> >> ## Change Summary >> >> To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. >> >> All changes of this PR summarized: >> - add a regression test, >> - update the relocation information after patching the method entry for making it not entrant, >> - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. >> >> ## Testing >> >> I ran tiers 1 through 3 and Oracle internal testing. > > test/hotspot/jtreg/compiler/print/TestPrintAssemblyDeoptRace.java line 28: > >> 26: * @bug 8258229 >> 27: * @summary If a method is made not entrant while prinint the assembly, hotspot crashes due to mismatched relocation information. >> 28: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:CompileCommand=print,java/math/BitSieve.bit > > Shouldn't the test run with `-XX:+DeoptimizeALot`? That way we would get more confidence that the deoptimization support works as expected. That is a good point. I'll add it in v2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24831#discussion_r2065948873 From duke at openjdk.org Tue Apr 29 09:53:20 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Tue, 29 Apr 2025 09:53:20 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list Message-ID: Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. ------------- Commit messages: - Inital fix Changes: https://git.openjdk.org/jdk/pull/24890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347515 Stats: 8 lines in 1 file changed: 7 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24890/head:pull/24890 PR: https://git.openjdk.org/jdk/pull/24890 From mli at openjdk.org Tue Apr 29 10:05:49 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 29 Apr 2025 10:05:49 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 09:38:16 GMT, Fei Yang wrote: >> Hi, please review this change. >> https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. >> This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. >> This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. >> >> Testing: >> - [x] make test TEST="jdk_vector" (QEMU / fastdebug) >> >> JMH tested on BPI-F3 SBC (256-bit VLEN) for reference: >> Before: >> >> ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms >> ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms >> ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms >> ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms >> ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms >> ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms >> ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms >> ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms >> IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms >> IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms >> IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms >> IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms >> LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms >> LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms >> LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms >> LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms >> >> >> After: >> >> ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms >> ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms >> ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms >> ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms >> ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms >> ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms >> ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms >> ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms >> IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms >> IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms >> IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.30... > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8355667 > - 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations Looks good. Just one minor question, should we add an assert like below for these instructs? Seems not, but I'm not quite sure. assert(Matcher::vector_element_basic_type(n) != T_FLOAT && Matcher::vector_element_basic_type(n) != T_DOUBLE); ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24909#pullrequestreview-2802834980 From epeter at openjdk.org Tue Apr 29 10:14:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 10:14:51 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: On Tue, 1 Apr 2025 07:18:45 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > typo Keep alive - just waiting for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2838216549 From yzheng at openjdk.org Tue Apr 29 10:17:46 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 29 Apr 2025 10:17:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v4] In-Reply-To: References: <3DRTEheyn6n6OYx38sL8-tqQbycO-QIjfwqwlErr5TI=.6cf2043b-f7f8-4a9d-9b59-3a844e74eaf2@github.com> Message-ID: On Thu, 24 Apr 2025 22:15:32 GMT, Vladimir Ivanov wrote: > BTW all CPU feature constants in AMD64HotSpotVMConfig change their meaning. I don't see any usages in JDK code. Should they go away now? I will remove them in https://github.com/openjdk/jdk/pull/23159 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2838223030 From epeter at openjdk.org Tue Apr 29 10:24:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 10:24:47 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: Message-ID: <1Qo4pB9I7Ok4ntXSE-KkE0sv-Tp5EVCWriWnjcf2iEE=.a7e28640-85df-436a-9c82-3c067cc88dee@github.com> On Fri, 25 Apr 2025 07:24:15 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2... Yes, this discussion is down to `requires` vs `applyIf`. This is my argument for `applyIf`, quoted from above, I have not yet seen an argument against it: > If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. In my understanding, `requires` should only be used if the test really **requires** a certain platform or feature. That can be because some flags are only available under certain platforms for example. But for IR tests, we should try to always use `applyIf`, because it allows testing on other platforms. Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891 We should try to move as many tests from using `requires` to `applyIf`, so that we have an increased test coverage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2838240704 From dnsimon at openjdk.org Tue Apr 29 10:33:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 29 Apr 2025 10:33:04 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 29 Apr 2025 01:35:03 GMT, Dean Long wrote: >> src/hotspot/share/code/nmethod.cpp line 1505: >> >>> 1503: CHECKED_CAST(_oops_size, uint16_t, align_up(code_buffer->total_oop_size(), oopSize)); >>> 1504: uint16_t metadata_size = (uint16_t)align_up(code_buffer->total_metadata_size(), wordSize); >>> 1505: JVMCI_ONLY(CHECKED_CAST(_jvmci_data_size, uint16_t, align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize))); >> >> This cast is lossy in that `jvmci_data->size()` returns an int. It caused a `double free or corruption (out)` crash in Graal in the case where a `JVMCINMethodData` had a very long name. We've fixed this by [limiting the length of the name](https://github.com/openjdk/jdk/pull/24753) but I'm wondering if there was some special reason for this cast? If so, can you please add extra logic preventing this code from running off the end of allocated memory: >> >> #if INCLUDE_JVMCI >> if (compiler->is_jvmci()) { >> // Initialize the JVMCINMethodData object inlined into nm >> jvmci_nmethod_data()->copy(jvmci_data); >> } >> #endif >> >> If not, please remove the cast. > > The cast was added by 8331087, which reduced the supported JVMCI data size to uint16_t. I don't remember this issue with long names coming up during that review, so I guess we all missed it. @dougxc please file a bug so we can track this. It seems like JVMCINMethodData::copy should do something like truncate long names rather than blindly assuming it has enough space. If uint16_t is unreasonably small for JVMCI nmethod data we could revert that change and make it 32 bits again. I think in 8331087, I think only `_jvmci_data_offset` was subject to the [narrowing cast](https://github.com/openjdk/jdk/pull/18984/files#diff-c345a29edc19eb49f833e007143e00f897bb762b0f451a202e1d2f5304ed2308R1496). I've opened https://bugs.openjdk.org/browse/JDK-8355896. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r2066017673 From duke at openjdk.org Tue Apr 29 11:58:48 2025 From: duke at openjdk.org (Hendrik Schick) Date: Tue, 29 Apr 2025 11:58:48 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 15:28:52 GMT, Daniel Lund?n wrote: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. src/hotspot/share/opto/gcm.cpp line 674: > 672: // > 673: // 1. raise the load's LCA to force the load to (eventually) be scheduled at > 674: // latest in the stores's block, and Suggestion: // latest in the store's block, and src/hotspot/share/opto/gcm.cpp line 682: > 680: // path relative to the load if there are no paths from early to LCA that go > 681: // through the store's block. Such stores are not anti-dependent, and there is > 682: // no need to update the LCA nor to add anti-depencence edges. Suggestion: // no need to update the LCA nor to add anti-dependence edges. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2066163563 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2066159325 From shade at openjdk.org Tue Apr 29 12:47:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 12:47:25 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: References: Message-ID: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> > During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. > > This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. > > Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fiddle with locks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24933/files - new: https://git.openjdk.org/jdk/pull/24933/files/f8864d2a..82cff0e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24933&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24933&range=00-01 Stats: 3 lines in 2 files changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24933.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24933/head:pull/24933 PR: https://git.openjdk.org/jdk/pull/24933 From shade at openjdk.org Tue Apr 29 12:47:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 12:47:25 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:36:41 GMT, Andrew Dinn wrote: > I checked the two callers of this method and noticed that `MethodHandles::add_dependent_nmethod` includes `assert_locked_or_safepoint(CodeCache_lock)` while `InstanceKlass::add_dependent_nmethod` has no assert. First, should both not include an assert? Second, why allow for being at a safepoint in the case of a MethodHandle rather than just assert the lock is held? Yeah, locks are too weak in callers. We can make them stronger, as in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2838689318 From mli at openjdk.org Tue Apr 29 12:53:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 29 Apr 2025 12:53:18 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java Message-ID: Hi, Can you help to review this patch to enable TestIRFma.java? FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. Also tested on machine with `asimd` support. Thanks! ------------- Commit messages: - change matching type - initial commit Changes: https://git.openjdk.org/jdk/pull/24947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24947&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355704 Stats: 50 lines in 2 files changed: 16 ins; 0 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/24947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24947/head:pull/24947 PR: https://git.openjdk.org/jdk/pull/24947 From duke at openjdk.org Tue Apr 29 12:56:07 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 29 Apr 2025 12:56:07 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for [v2] In-Reply-To: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: > ## Issue Summary > > The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. > > ## Change Summary > > To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. > > All changes of this PR summarized: > - add a regression test, > - update the relocation information after patching the method entry for making it not entrant, > - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. > > ## Testing > > I ran tiers 1 through 3 and Oracle internal testing. Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into jdk-8258229-nmethod - Add DeoptimizeALot and fix typo in test - Hold NMethodState_lock while printing an nmethod This prevents data races on the relocation info when code is patched. - Update relocation info when making method not entrant - Add regression test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24831/files - new: https://git.openjdk.org/jdk/pull/24831/files/8cf462f2..956f45a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24831&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24831&range=00-01 Stats: 24438 lines in 752 files changed: 17057 ins; 4744 del; 2637 mod Patch: https://git.openjdk.org/jdk/pull/24831.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24831/head:pull/24831 PR: https://git.openjdk.org/jdk/pull/24831 From duke at openjdk.org Tue Apr 29 12:56:08 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 29 Apr 2025 12:56:08 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: On Wed, 23 Apr 2025 15:12:54 GMT, Manuel H?ssig wrote: > ## Issue Summary > > The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. > > ## Change Summary > > To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. > > All changes of this PR summarized: > - add a regression test, > - update the relocation information after patching the method entry for making it not entrant, > - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. > > ## Testing > > I ran tiers 1 through 3 and Oracle internal testing. - Added `-XX:+DeoptimizeALot` to regression test - Fixed a typo - [ ] Reran testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/24831#issuecomment-2838729410 From fyang at openjdk.org Tue Apr 29 13:11:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 13:11:33 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations [v3] In-Reply-To: References: Message-ID: > Hi, please review this change. > https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. > This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. > This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. > > Testing: > - [x] make test TEST="jdk_vector" (QEMU / fastdebug) > > JMH tested on BPI-F3 SBC (256-bit VLEN) for reference: > Before: > > ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms > LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms > LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms > LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms > LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms > > > After: > > ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.309 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 535.359 ? 15.765 ops/ms > Long... Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Review Comment - Merge remote-tracking branch 'upstream/master' into JDK-8355667 - Merge remote-tracking branch 'upstream/master' into JDK-8355667 - 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24909/files - new: https://git.openjdk.org/jdk/pull/24909/files/b4d876f2..0ad6748e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24909&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24909&range=01-02 Stats: 4680 lines in 70 files changed: 4112 ins; 308 del; 260 mod Patch: https://git.openjdk.org/jdk/pull/24909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24909/head:pull/24909 PR: https://git.openjdk.org/jdk/pull/24909 From fyang at openjdk.org Tue Apr 29 13:11:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 13:11:33 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations [v2] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 10:03:21 GMT, Hamlin Li wrote: > Looks good. Just one minor question, should we add an assert like below for these instructs? Seems not, but I'm not quite sure. > > ``` > assert(Matcher::vector_element_basic_type(n) != T_FLOAT && > Matcher::vector_element_basic_type(n) != T_DOUBLE); > ``` Good suggestion! I have added following assertion for these instructions just like other CPU ports. Please take another look. Thanks. `assert(is_integral_type(bt), "unsupported type");` ------------- PR Comment: https://git.openjdk.org/jdk/pull/24909#issuecomment-2838817355 From mli at openjdk.org Tue Apr 29 13:21:46 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 29 Apr 2025 13:21:46 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations [v3] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 13:11:33 GMT, Fei Yang wrote: >> Hi, please review this change. >> https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. >> This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. >> This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. >> >> Testing: >> - [x] make test TEST="jdk_vector" (QEMU / fastdebug) >> >> JMH tested on BPI-F3 SBC (256-bit VLEN) for reference: >> Before: >> >> ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms >> ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms >> ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms >> ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms >> ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms >> ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms >> ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms >> ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms >> IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms >> IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms >> IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms >> IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms >> LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms >> LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms >> LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms >> LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms >> >> >> After: >> >> ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms >> ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms >> ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms >> ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms >> ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms >> ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms >> ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms >> ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms >> IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms >> IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms >> IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.30... > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Review Comment > - Merge remote-tracking branch 'upstream/master' into JDK-8355667 > - Merge remote-tracking branch 'upstream/master' into JDK-8355667 > - 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations Thanks for updating, still good! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24909#pullrequestreview-2803455693 From fyang at openjdk.org Tue Apr 29 13:43:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 13:43:55 GMT Subject: Integrated: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 02:09:35 GMT, Fei Yang wrote: > Hi, please review this change. > https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. > This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. > This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. > > Testing: > - [x] make test TEST="jdk_vector" (QEMU / fastdebug) > > JMH tested on BPI-F3 SBC (256-bit VLEN) for reference: > Before: > > ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms > LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms > LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms > LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms > LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms > > > After: > > ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms > ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms > ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms > ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms > ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms > ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms > ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms > ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms > IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms > IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms > IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.309 ops/ms > IntMaxVector.UMINMasked 1024 thrpt 5 535.359 ? 15.765 ops/ms > Long... This pull request has now been integrated. Changeset: 2ed7ad4b Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/2ed7ad4b5c7d2344ae6571c186f8a2903770aa57 Stats: 95 lines in 3 files changed: 60 ins; 0 del; 35 mod 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations Reviewed-by: mli, gcao ------------- PR: https://git.openjdk.org/jdk/pull/24909 From fyang at openjdk.org Tue Apr 29 13:43:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Apr 2025 13:43:54 GMT Subject: RFR: 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations [v3] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 13:11:33 GMT, Fei Yang wrote: >> Hi, please review this change. >> https://bugs.openjdk.org/browse/JDK-8338021 proposed new vector operators including Unsigned Vector Min / Max. >> This intrinsify Unsigned Vector Min / Max operations with RVV extension for RISC-V backend to improve performance. >> This also enables some extra IR tests in file `test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java`. >> >> Testing: >> - [x] make test TEST="jdk_vector" (QEMU / fastdebug) >> >> JMH tested on BPI-F3 SBC (256-bit VLEN) for reference: >> Before: >> >> ByteMaxVector.UMAX 1024 thrpt 5 58.657 ? 17.216 ops/ms >> ByteMaxVector.UMAXMasked 1024 thrpt 5 45.581 ? 18.164 ops/ms >> ByteMaxVector.UMIN 1024 thrpt 5 55.275 ? 15.863 ops/ms >> ByteMaxVector.UMINMasked 1024 thrpt 5 44.651 ? 29.209 ops/ms >> ShortMaxVector.UMAX 1024 thrpt 5 24.146 ? 7.570 ops/ms >> ShortMaxVector.UMAXMasked 1024 thrpt 5 21.506 ? 0.430 ops/ms >> ShortMaxVector.UMIN 1024 thrpt 5 24.261 ? 6.993 ops/ms >> ShortMaxVector.UMINMasked 1024 thrpt 5 20.980 ? 1.622 ops/ms >> IntMaxVector.UMAX 1024 thrpt 5 10.780 ? 0.812 ops/ms >> IntMaxVector.UMAXMasked 1024 thrpt 5 10.609 ? 0.851 ops/ms >> IntMaxVector.UMIN 1024 thrpt 5 10.845 ? 0.578 ops/ms >> IntMaxVector.UMINMasked 1024 thrpt 5 10.705 ? 0.562 ops/ms >> LongMaxVector.UMAX 1024 thrpt 5 5.445 ? 0.439 ops/ms >> LongMaxVector.UMAXMasked 1024 thrpt 5 5.387 ? 0.285 ops/ms >> LongMaxVector.UMIN 1024 thrpt 5 5.379 ? 0.407 ops/ms >> LongMaxVector.UMINMasked 1024 thrpt 5 5.373 ? 0.236 ops/ms >> >> >> After: >> >> ByteMaxVector.UMAX 1024 thrpt 5 2552.161 ? 121.213 ops/ms >> ByteMaxVector.UMAXMasked 1024 thrpt 5 2444.001 ? 105.139 ops/ms >> ByteMaxVector.UMIN 1024 thrpt 5 2616.963 ? 3.065 ops/ms >> ByteMaxVector.UMINMasked 1024 thrpt 5 2367.968 ? 1057.028 ops/ms >> ShortMaxVector.UMAX 1024 thrpt 5 1363.676 ? 9.294 ops/ms >> ShortMaxVector.UMAXMasked 1024 thrpt 5 1321.759 ? 121.471 ops/ms >> ShortMaxVector.UMIN 1024 thrpt 5 1368.598 ? 5.251 ops/ms >> ShortMaxVector.UMINMasked 1024 thrpt 5 1337.044 ? 1.434 ops/ms >> IntMaxVector.UMAX 1024 thrpt 5 566.509 ? 0.658 ops/ms >> IntMaxVector.UMAXMasked 1024 thrpt 5 559.456 ? 0.491 ops/ms >> IntMaxVector.UMIN 1024 thrpt 5 569.238 ? 1.30... > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Review Comment > - Merge remote-tracking branch 'upstream/master' into JDK-8355667 > - Merge remote-tracking branch 'upstream/master' into JDK-8355667 > - 8355667: RISC-V: Add backend implementation for unsigned vector Min / Max operations Thanks for the review! Let's ------------- PR Comment: https://git.openjdk.org/jdk/pull/24909#issuecomment-2838964108 From mli at openjdk.org Tue Apr 29 13:48:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 29 Apr 2025 13:48:23 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java Message-ID: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Hi, Can you help to review this simple patch? Previously, BasicFloatOpTest.java is accidently not really enabled on riscv. And FmaVF/FmaVD depends on both UseFMA and UseRVV, the code should make it clear in this sense. And IR verification of FmaVF in BasicFloatOpTest.java should only be enabled when UseFMA && rvv. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24950/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24950&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355913 Stats: 10 lines in 3 files changed: 2 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24950.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24950/head:pull/24950 PR: https://git.openjdk.org/jdk/pull/24950 From adinn at openjdk.org Tue Apr 29 14:26:47 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 29 Apr 2025 14:26:47 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> References: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> Message-ID: On Tue, 29 Apr 2025 12:47:25 GMT, Aleksey Shipilev wrote: >> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. >> >> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. >> >> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fiddle with locks Looks good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24933#pullrequestreview-2803881954 From asmehra at openjdk.org Tue Apr 29 14:46:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 29 Apr 2025 14:46:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v8] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 21:52:43 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Downgraded UL as asked. Added synchronization to C strings caching. src/hotspot/share/code/aotCodeCache.hpp line 340: > 338: static const char* add_C_string(const char* str) NOT_CDS_RETURN_(str); > 339: > 340: static const char* add_C_string2(const char* str) NOT_CDS_RETURN_(str); This method is not used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2066731291 From amitkumar at openjdk.org Tue Apr 29 14:47:51 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Apr 2025 14:47:51 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:02:45 GMT, Martin Doerr wrote: > How is the behavior of mvc specified when hitting a signal (SIGSEGV or SIGBUS)? Will all Bytes before that place be written? I guess that is the reason why the C++ code uses the atomic version. Did you check which instructions are generated by the C++ compiler? I couldn't find out any such information in book but I will try to find. @RealLucy can you help :-) I looked at the cpp generated code initially. I see some further improvements being generated but that will come with additional checks. But in the end you can see the `mvc` instruction being used for unaligned case: (gdb) disassemble Copy::fill_to_memory_atomic(void*, unsigned long, unsigned char) Dump of assembler code for function _ZN4Copy21fill_to_memory_atomicEPvmh: 0x000003fffc7b7358 <+0>: lgr %r0,%r2 0x000003fffc7b735c <+4>: ogr %r0,%r3 0x000003fffc7b7360 <+8>: lgr %r1,%r4 0x000003fffc7b7364 <+12>: risbgz %r5,%r0,61,63,0 0x000003fffc7b736a <+18>: jne 0x3fffc7b73aa <_ZN4Copy21fill_to_memory_atomicEPvmh+82> 0x000003fffc7b736e <+22>: cgijne %r4,0,0x3fffc7b756e <_ZN4Copy21fill_to_memory_atomicEPvmh+534> 0x000003fffc7b7374 <+28>: lghi %r4,0 0x000003fffc7b7378 <+32>: cgije %r3,0,0x3fffc7b7598 <_ZN4Copy21fill_to_memory_atomicEPvmh+576> 0x000003fffc7b737e <+38>: lgr %r1,%r3 0x000003fffc7b7382 <+42>: aghi %r1,-1 0x000003fffc7b7386 <+46>: aghi %r3,7 0x000003fffc7b738a <+50>: srlg %r1,%r1,3 0x000003fffc7b7390 <+56>: aghi %r1,1 0x000003fffc7b7394 <+60>: clgijnh %r3,7,0x3fffc7b7590 <_ZN4Copy21fill_to_memory_atomicEPvmh+568> 0x000003fffc7b739a <+66>: stg %r4,0(%r5,%r2) 0x000003fffc7b73a0 <+72>: aghi %r5,8 0x000003fffc7b73a4 <+76>: brctg %r1,0x3fffc7b739a <_ZN4Copy21fill_to_memory_atomicEPvmh+66> 0x000003fffc7b73a8 <+80>: br %r14 0x000003fffc7b73aa <+82>: risbgz %r4,%r0,62,63,0 0x000003fffc7b73b0 <+88>: ldgr %f0,%r11 0x000003fffc7b73b4 <+92>: ldgr %f1,%r7 0x000003fffc7b73b8 <+96>: ldgr %f6,%r8 0x000003fffc7b73bc <+100>: ldgr %f4,%r9 0x000003fffc7b73c0 <+104>: ldgr %f2,%r10 0x000003fffc7b73c4 <+108>: lgr %r11,%r15 0x000003fffc7b73c8 <+112>: jne 0x3fffc7b747c <_ZN4Copy21fill_to_memory_atomicEPvmh+292> 0x000003fffc7b73cc <+116>: cgijne %r1,0,0x3fffc7b759a <_ZN4Copy21fill_to_memory_atomicEPvmh+578> 0x000003fffc7b73d2 <+122>: lhi %r5,0 0x000003fffc7b73d6 <+126>: cgije %r3,0,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b73dc <+132>: lghi %r0,-4 0x000003fffc7b73e0 <+136>: lgr %r1,%r3 0x000003fffc7b73e4 <+140>: aghi %r1,-1 0x000003fffc7b73e8 <+144>: clgrjh %r3,%r0,0x3fffc7b7626 <_ZN4Copy21fill_to_memory_atomicEPvmh+718> 0x000003fffc7b73ee <+150>: clgijnh %r1,103,0x3fffc7b7626 <_ZN4Copy21fill_to_memory_atomicEPvmh+718> 0x000003fffc7b73f4 <+156>: srlg %r0,%r1,2 0x000003fffc7b73fa <+162>: risbgz %r3,%r2,63,63,62 0x000003fffc7b7400 <+168>: aghi %r0,1 0x000003fffc7b7404 <+172>: cgije %r3,0,0x3fffc7b760e <_ZN4Copy21fill_to_memory_atomicEPvmh+694> 0x000003fffc7b740a <+178>: st %r5,0(%r2) 0x000003fffc7b740e <+182>: lghi %r8,4 0x000003fffc7b7412 <+186>: sgr %r0,%r3 0x000003fffc7b7416 <+190>: sllg %r3,%r3,2 0x000003fffc7b741c <+196>: lghi %r9,0 --Type for more, q to quit, c to continue without paging--c 0x000003fffc7b7420 <+200>: risbg %r9,%r5,0,31,32 0x000003fffc7b7426 <+206>: lr %r9,%r5 0x000003fffc7b7428 <+208>: srlg %r1,%r0,1 0x000003fffc7b742e <+214>: la %r3,0(%r3,%r2) 0x000003fffc7b7432 <+218>: cgije %r1,0,0x3fffc7b7606 <_ZN4Copy21fill_to_memory_atomicEPvmh+686> 0x000003fffc7b7438 <+224>: sllg %r10,%r4,3 0x000003fffc7b743e <+230>: aghi %r4,1 0x000003fffc7b7442 <+234>: stg %r9,0(%r10,%r3) 0x000003fffc7b7448 <+240>: brctg %r1,0x3fffc7b7438 <_ZN4Copy21fill_to_memory_atomicEPvmh+224> 0x000003fffc7b744c <+244>: risbgz %r3,%r0,0,62,0 0x000003fffc7b7452 <+250>: sllg %r1,%r3,2 0x000003fffc7b7458 <+256>: agr %r1,%r8 0x000003fffc7b745c <+260>: cgrje %r0,%r3,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b7462 <+266>: st %r5,0(%r1,%r2) 0x000003fffc7b7466 <+270>: lgdr %r11,%f0 0x000003fffc7b746a <+274>: lgdr %r10,%f2 0x000003fffc7b746e <+278>: lgdr %r9,%f4 0x000003fffc7b7472 <+282>: lgdr %r8,%f6 0x000003fffc7b7476 <+286>: lgdr %r7,%f1 0x000003fffc7b747a <+290>: br %r14 0x000003fffc7b747c <+292>: risbgz %r4,%r0,63,63,0 0x000003fffc7b7482 <+298>: lr %r5,%r1 0x000003fffc7b7484 <+300>: jne 0x3fffc7b75ae <_ZN4Copy21fill_to_memory_atomicEPvmh+598> 0x000003fffc7b7488 <+304>: lr %r0,%r1 0x000003fffc7b748a <+306>: sll %r0,8 0x000003fffc7b748e <+310>: ar %r0,%r1 0x000003fffc7b7490 <+312>: lr %r5,%r0 0x000003fffc7b7492 <+314>: cgije %r3,0,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b7498 <+320>: lgr %r1,%r3 0x000003fffc7b749c <+324>: aghi %r1,-1 0x000003fffc7b74a0 <+328>: clgijnh %r1,39,0x3fffc7b7674 <_ZN4Copy21fill_to_memory_atomicEPvmh+796> 0x000003fffc7b74a6 <+334>: cgije %r3,-1,0x3fffc7b7674 <_ZN4Copy21fill_to_memory_atomicEPvmh+796> 0x000003fffc7b74ac <+340>: srlg %r5,%r2,1 0x000003fffc7b74b2 <+346>: srlg %r9,%r1,1 0x000003fffc7b74b8 <+352>: lcgr %r5,%r5 0x000003fffc7b74bc <+356>: aghi %r9,1 0x000003fffc7b74c0 <+360>: risbgz %r5,%r5,62,63,0 0x000003fffc7b74c6 <+366>: je 0x3fffc7b766c <_ZN4Copy21fill_to_memory_atomicEPvmh+788> 0x000003fffc7b74ca <+370>: sth %r0,0(%r2) 0x000003fffc7b74ce <+374>: cgije %r5,1,0x3fffc7b7664 <_ZN4Copy21fill_to_memory_atomicEPvmh+780> 0x000003fffc7b74d4 <+380>: sth %r0,2(%r2) 0x000003fffc7b74d8 <+384>: cgijne %r5,3,0x3fffc7b7616 <_ZN4Copy21fill_to_memory_atomicEPvmh+702> 0x000003fffc7b74de <+390>: sth %r0,4(%r2) 0x000003fffc7b74e2 <+394>: lghi %r7,6 0x000003fffc7b74e6 <+398>: lghi %r8,0 0x000003fffc7b74ea <+402>: risbg %r8,%r0,0,15,48 0x000003fffc7b74f0 <+408>: sgr %r9,%r5 0x000003fffc7b74f4 <+412>: sllg %r5,%r5,1 0x000003fffc7b74fa <+418>: risbg %r8,%r0,16,31,32 0x000003fffc7b7500 <+424>: srlg %r1,%r9,2 0x000003fffc7b7506 <+430>: la %r5,0(%r5,%r2) 0x000003fffc7b750a <+434>: risbg %r8,%r0,32,47,16 0x000003fffc7b7510 <+440>: risbg %r8,%r0,48,63,0 0x000003fffc7b7516 <+446>: cgije %r1,0,0x3fffc7b761e <_ZN4Copy21fill_to_memory_atomicEPvmh+710> 0x000003fffc7b751c <+452>: sllg %r10,%r4,3 0x000003fffc7b7522 <+458>: aghi %r4,1 0x000003fffc7b7526 <+462>: stg %r8,0(%r10,%r5) 0x000003fffc7b752c <+468>: brctg %r1,0x3fffc7b751c <_ZN4Copy21fill_to_memory_atomicEPvmh+452> 0x000003fffc7b7530 <+472>: risbgz %r4,%r9,0,61,0 0x000003fffc7b7536 <+478>: sllg %r1,%r4,1 0x000003fffc7b753c <+484>: agr %r1,%r7 0x000003fffc7b7540 <+488>: cgrje %r9,%r4,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b7546 <+494>: sth %r0,0(%r1,%r2) 0x000003fffc7b754a <+498>: lgr %r4,%r1 0x000003fffc7b754e <+502>: aghi %r4,2 0x000003fffc7b7552 <+506>: clgrjnl %r4,%r3,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b7558 <+512>: sth %r0,2(%r1,%r2) 0x000003fffc7b755c <+516>: la %r4,2(%r4) 0x000003fffc7b7560 <+520>: clgrjnl %r4,%r3,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b7566 <+526>: sth %r0,4(%r1,%r2) 0x000003fffc7b756a <+530>: j 0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b756e <+534>: sllg %r4,%r4,8 0x000003fffc7b7574 <+540>: agr %r4,%r1 0x000003fffc7b7578 <+544>: sllg %r1,%r4,16 0x000003fffc7b757e <+550>: agr %r4,%r1 0x000003fffc7b7582 <+554>: sllg %r1,%r4,32 0x000003fffc7b7588 <+560>: agr %r4,%r1 0x000003fffc7b758c <+564>: j 0x3fffc7b7378 <_ZN4Copy21fill_to_memory_atomicEPvmh+32> 0x000003fffc7b7590 <+568>: lghi %r1,1 0x000003fffc7b7594 <+572>: j 0x3fffc7b739a <_ZN4Copy21fill_to_memory_atomicEPvmh+66> 0x000003fffc7b7598 <+576>: br %r14 0x000003fffc7b759a <+578>: lr %r5,%r1 0x000003fffc7b759c <+580>: sll %r5,8 0x000003fffc7b75a0 <+584>: ar %r5,%r1 0x000003fffc7b75a2 <+586>: lr %r1,%r5 0x000003fffc7b75a4 <+588>: sll %r1,16 0x000003fffc7b75a8 <+592>: ar %r5,%r1 0x000003fffc7b75aa <+594>: j 0x3fffc7b73d6 <_ZN4Copy21fill_to_memory_atomicEPvmh+126> 0x000003fffc7b75ae <+598>: cgije %r3,0,0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b75b4 <+604>: lgr %r4,%r2 0x000003fffc7b75b8 <+608>: cgije %r3,1,0x3fffc7b76b2 <_ZN4Copy21fill_to_memory_atomicEPvmh+858> 0x000003fffc7b75be <+614>: aghi %r3,-2 0x000003fffc7b75c2 <+618>: srlg %r2,%r3,8 0x000003fffc7b75c8 <+624>: cgije %r2,0,0x3fffc7b75e6 <_ZN4Copy21fill_to_memory_atomicEPvmh+654> 0x000003fffc7b75ce <+630>: pfd 2,1024(%r4) 0x000003fffc7b75d4 <+636>: stc %r5,0(%r4) 0x000003fffc7b75d8 <+640>: mvc 1(255,%r4),0(%r4) 0x000003fffc7b75de <+646>: la %r4,256(%r4) 0x000003fffc7b75e2 <+650>: brctg %r2,0x3fffc7b75ce <_ZN4Copy21fill_to_memory_atomicEPvmh+630> 0x000003fffc7b75e6 <+654>: stc %r1,0(%r4) 0x000003fffc7b75ea <+658>: exrl %r3,0x3fffc7b76ba <_ZN4Copy21fill_to_memory_atomicEPvmh+866> 0x000003fffc7b75f0 <+664>: lgdr %r11,%f0 0x000003fffc7b75f4 <+668>: lgdr %r10,%f2 0x000003fffc7b75f8 <+672>: lgdr %r9,%f4 0x000003fffc7b75fc <+676>: lgdr %r8,%f6 0x000003fffc7b7600 <+680>: lgdr %r7,%f1 0x000003fffc7b7604 <+684>: br %r14 0x000003fffc7b7606 <+686>: lghi %r1,1 0x000003fffc7b760a <+690>: j 0x3fffc7b7438 <_ZN4Copy21fill_to_memory_atomicEPvmh+224> 0x000003fffc7b760e <+694>: lghi %r8,0 0x000003fffc7b7612 <+698>: j 0x3fffc7b7412 <_ZN4Copy21fill_to_memory_atomicEPvmh+186> 0x000003fffc7b7616 <+702>: lghi %r7,4 0x000003fffc7b761a <+706>: j 0x3fffc7b74e6 <_ZN4Copy21fill_to_memory_atomicEPvmh+398> 0x000003fffc7b761e <+710>: lghi %r1,1 0x000003fffc7b7622 <+714>: j 0x3fffc7b751c <_ZN4Copy21fill_to_memory_atomicEPvmh+452> 0x000003fffc7b7626 <+718>: lgr %r1,%r3 0x000003fffc7b762a <+722>: aghi %r1,-1 0x000003fffc7b762e <+726>: lgr %r0,%r3 0x000003fffc7b7632 <+730>: aghi %r0,3 0x000003fffc7b7636 <+734>: srlg %r1,%r1,2 0x000003fffc7b763c <+740>: aghi %r1,1 0x000003fffc7b7640 <+744>: clgijnh %r0,3,0x3fffc7b765c <_ZN4Copy21fill_to_memory_atomicEPvmh+772> 0x000003fffc7b7646 <+750>: cgije %r3,0,0x3fffc7b765c <_ZN4Copy21fill_to_memory_atomicEPvmh+772> 0x000003fffc7b764c <+756>: st %r5,0(%r4,%r2) 0x000003fffc7b7650 <+760>: aghi %r4,4 0x000003fffc7b7654 <+764>: brctg %r1,0x3fffc7b764c <_ZN4Copy21fill_to_memory_atomicEPvmh+756> 0x000003fffc7b7658 <+768>: j 0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b765c <+772>: lghi %r1,1 0x000003fffc7b7660 <+776>: j 0x3fffc7b764c <_ZN4Copy21fill_to_memory_atomicEPvmh+756> 0x000003fffc7b7664 <+780>: lghi %r7,2 0x000003fffc7b7668 <+784>: j 0x3fffc7b74e6 <_ZN4Copy21fill_to_memory_atomicEPvmh+398> 0x000003fffc7b766c <+788>: lghi %r7,0 0x000003fffc7b7670 <+792>: j 0x3fffc7b74e6 <_ZN4Copy21fill_to_memory_atomicEPvmh+398> 0x000003fffc7b7674 <+796>: lgr %r1,%r3 0x000003fffc7b7678 <+800>: aghi %r1,-1 0x000003fffc7b767c <+804>: lgr %r0,%r3 0x000003fffc7b7680 <+808>: aghi %r0,1 0x000003fffc7b7684 <+812>: srlg %r1,%r1,1 0x000003fffc7b768a <+818>: aghi %r1,1 0x000003fffc7b768e <+822>: clgijnh %r0,1,0x3fffc7b76aa <_ZN4Copy21fill_to_memory_atomicEPvmh+850> 0x000003fffc7b7694 <+828>: cgije %r3,0,0x3fffc7b76aa <_ZN4Copy21fill_to_memory_atomicEPvmh+850> 0x000003fffc7b769a <+834>: sth %r5,0(%r4,%r2) 0x000003fffc7b769e <+838>: aghi %r4,2 0x000003fffc7b76a2 <+842>: brctg %r1,0x3fffc7b769a <_ZN4Copy21fill_to_memory_atomicEPvmh+834> 0x000003fffc7b76a6 <+846>: j 0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b76aa <+850>: lghi %r1,1 0x000003fffc7b76ae <+854>: j 0x3fffc7b769a <_ZN4Copy21fill_to_memory_atomicEPvmh+834> 0x000003fffc7b76b2 <+858>: stc %r1,0(%r2) 0x000003fffc7b76b6 <+862>: j 0x3fffc7b7466 <_ZN4Copy21fill_to_memory_atomicEPvmh+270> 0x000003fffc7b76ba <+866>: mvc 1(1,%r4),0(%r4) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2839202077 From asmehra at openjdk.org Tue Apr 29 14:49:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 29 Apr 2025 14:49:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v10] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 06:27:41 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Fix C strings caching > - Merge branch 'master' into JDK-8350209 > - Merge branch 'master' into JDK-8350209 > - Downgraded UL as asked. Added synchronization to C strings caching. > - Fix message > - Generate far jumps for AOT code on AArch64 > - remove _enabled suffix > - Add sanity test for AOTAdapterCaching flag > - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. > - Removed unused AOTCodeSection class > - ... and 1 more: https://git.openjdk.org/jdk/compare/7cf190fb...1b0c89f6 Just to note there is still some code related to blobs but we are not storing them yet in the aot code cache. But I think we would need them pretty soon. So I think it should be okay to keep that code as is. rest lgtm ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2839210239 PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2839211140 From shade at openjdk.org Tue Apr 29 15:07:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 15:07:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Tue, 29 Apr 2025 09:18:59 GMT, Aleksey Shipilev wrote: >> Ok, thanks for checking! Good to know there's no existing bug. >> >> What I had in mind is as follows: >> >> InstanceKlass* holder = method->method_holder(); >> if (holder->class_loader_data()->is_permanent_class_loader_data()) { >> return nullptr; // method holder class can't be unloaded >> } else { >> // Normal class, return the holder that would block unloading. >> // This would be either classloader oop for non-hidden classes, >> // or Java mirror oop for hidden classes. >> assert(holder->klass_holder() != nullptr, ""); >> return holder->klass_holder(); >> } >> >> >> IMO it makes the check more precise and, at the same time, communicates the intent better. What do you think? > > Yes, OK, let's do a variant of that. Committed. I'll re-run test to see if there are any surprises about these asserts. Testing is still green, no surprises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2066776548 From shade at openjdk.org Tue Apr 29 15:15:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 15:15:47 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker @kimbarrett, is this what you had in mind when suggesting this originally: "But I don't think that's the way to go, because I think this code shouldn't be using JNIHandles and jobjects at all. It should be using oop* from VMGlobal and VMWeak."? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2839296934 From kvn at openjdk.org Tue Apr 29 15:21:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 15:21:46 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> References: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> Message-ID: <9gAx0rQZkbjxoNdEHs7DEbM4LKU0-yhI8N6Ro2xOZ10=.66056776-715f-4597-b1d4-caeed0b54e39@github.com> On Tue, 29 Apr 2025 12:47:25 GMT, Aleksey Shipilev wrote: >> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. >> >> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. >> >> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fiddle with locks Let me test it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2839317685 From kvn at openjdk.org Tue Apr 29 15:29:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 15:29:08 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v11] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Remove unused method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/1b0c89f6..b2466e6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From kvn at openjdk.org Tue Apr 29 15:29:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 15:29:16 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v8] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 14:42:42 GMT, Ashutosh Mehra wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Downgraded UL as asked. Added synchronization to C strings caching. > > src/hotspot/share/code/aotCodeCache.hpp line 340: > >> 338: static const char* add_C_string(const char* str) NOT_CDS_RETURN_(str); >> 339: >> 340: static const char* add_C_string2(const char* str) NOT_CDS_RETURN_(str); > > This method is not used. removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2066822894 From asmehra at openjdk.org Tue Apr 29 15:40:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 29 Apr 2025 15:40:56 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v11] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 15:29:08 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused method Marked as reviewed by asmehra (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2804162746 From kvn at openjdk.org Tue Apr 29 16:08:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 16:08:59 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v10] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 14:47:42 GMT, Ashutosh Mehra wrote: >> Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Fix C strings caching >> - Merge branch 'master' into JDK-8350209 >> - Merge branch 'master' into JDK-8350209 >> - Downgraded UL as asked. Added synchronization to C strings caching. >> - Fix message >> - Generate far jumps for AOT code on AArch64 >> - remove _enabled suffix >> - Add sanity test for AOTAdapterCaching flag >> - AOT code flags are ignored when AOTCache is not specified. Set range for AOTCodeMaxSize values. >> - Removed unused AOTCodeSection class >> - ... and 1 more: https://git.openjdk.org/jdk/compare/7cf190fb...1b0c89f6 > > rest lgtm Thank you, @ashu-mehra, for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2839457011 From epeter at openjdk.org Tue Apr 29 16:21:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 16:21:46 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 21:56:10 GMT, Saranya Natarajan wrote: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. @sarannat The fix looks reasonable to me, and looking forward to the follow-up JDK-8325478 :) It looks like the JBS issue has some `Reduced.java`, which is from a fuzzer `Test.java`, correct? Can you attach that one as a JTREG test here? Or is there already a very good and minimal reproducer in the repository? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24890#issuecomment-2839499146 From epeter at openjdk.org Tue Apr 29 16:29:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 16:29:51 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: <0w6QGqakZpPeVMuHTef6H4Bv-iXWCsycZB4jpd1ZO_0=.7c4669c3-cccf-4ea4-8546-c3c0b883bacf@github.com> On Wed, 23 Apr 2025 10:58:54 GMT, Marc Chevalier wrote: > The double `(double)count * prof_factor * method_life / counter_life + 0.5` > can overflow a 32-bit int, causing UB on casting, but in practice computing > a wrong scale, probably. > > We just need to compare that the cast is not going to overflow. This is possible > because `INT_MAX` is exactly representable in a `double`. It is also good to > notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` > cannot overflow a `double`: > - `count` is a int, max value = 2^31-1 < 2.2e9 > - `method_lie` is a int, max value < 2.2e9 > - `prof_factor` is a float, max value < 3.5e38 > - `counter_life` is a int, positive at this point, so min value = 1 > So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the > max value of a double (about 1.8e308). We probably would have precision issues, but > it probably doesn't matter a lot. > > The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. > > Thanks, > Marc Looks reasonable, I would just add a little comment to the code. src/hotspot/share/ci/ciMethod.cpp line 919: > 917: double count_d = (double)count * prof_factor * method_life / counter_life + 0.5; > 918: if (count_d >= static_cast(INT_MAX)) { > 919: count = INT_MAX; Suggestion: // Clamp in case of overflowing int range. count = INT_MAX; ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24824#pullrequestreview-2804298750 PR Review Comment: https://git.openjdk.org/jdk/pull/24824#discussion_r2066928575 From epeter at openjdk.org Tue Apr 29 16:36:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 16:36:49 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: <_Lde8-U9gDYDIMGDWg5aQKrWTq2suhAmkz97-xy54QU=.b73a1b9b-64dc-49f7-95d5-d00bd98a67eb@github.com> On Tue, 1 Apr 2025 16:23:49 GMT, Zdenek Zambersky wrote: >> @zzambers >> Generally we want to get away from `@requires vm.compiler2.enabled`, because it means tests are only run on C2 and not other compilers. For example if C2 is disabled and we only have C1. Or only interpreter. Or Graal ... >> >> Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. > >> Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. > > I saw that approach sometimes used as well. (My little, probably unfounded concern, would be, that typos in args could then be silently ignored.) > > I can change my PR to use `-XX:-IgnoreUnrecognizedVMOptions` instead, if that approach is preferable. @zzambers Are you still working on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2839535834 From mchevalier at openjdk.org Tue Apr 29 16:37:02 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 29 Apr 2025 16:37:02 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: References: Message-ID: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> > The double `(double)count * prof_factor * method_life / counter_life + 0.5` > can overflow a 32-bit int, causing UB on casting, but in practice computing > a wrong scale, probably. > > We just need to compare that the cast is not going to overflow. This is possible > because `INT_MAX` is exactly representable in a `double`. It is also good to > notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` > cannot overflow a `double`: > - `count` is a int, max value = 2^31-1 < 2.2e9 > - `method_lie` is a int, max value < 2.2e9 > - `prof_factor` is a float, max value < 3.5e38 > - `counter_life` is a int, positive at this point, so min value = 1 > So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the > max value of a double (about 1.8e308). We probably would have precision issues, but > it probably doesn't matter a lot. > > The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: +comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24824/files - new: https://git.openjdk.org/jdk/pull/24824/files/b5d80813..125d9c12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24824&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24824&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24824/head:pull/24824 PR: https://git.openjdk.org/jdk/pull/24824 From epeter at openjdk.org Tue Apr 29 16:37:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 16:37:02 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: <2qXklMgo-0_GQWzQCtqD_2eKwbttqHb0m6eWHYWyNZE=.278a6b14-179b-47ff-9673-2305d1481bec@github.com> On Tue, 29 Apr 2025 16:33:50 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24824#pullrequestreview-2804315847 From mchevalier at openjdk.org Tue Apr 29 16:37:02 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 29 Apr 2025 16:37:02 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <0w6QGqakZpPeVMuHTef6H4Bv-iXWCsycZB4jpd1ZO_0=.7c4669c3-cccf-4ea4-8546-c3c0b883bacf@github.com> References: <0w6QGqakZpPeVMuHTef6H4Bv-iXWCsycZB4jpd1ZO_0=.7c4669c3-cccf-4ea4-8546-c3c0b883bacf@github.com> Message-ID: On Tue, 29 Apr 2025 16:26:16 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> +comment > > src/hotspot/share/ci/ciMethod.cpp line 919: > >> 917: double count_d = (double)count * prof_factor * method_life / counter_life + 0.5; >> 918: if (count_d >= static_cast(INT_MAX)) { >> 919: count = INT_MAX; > > Suggestion: > > // Clamp in case of overflowing int range. > count = INT_MAX; Yes, good. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24824#discussion_r2066936765 From kvn at openjdk.org Tue Apr 29 16:39:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 16:39:46 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc I looked through testing logs on slowest machine `macosx-x64` and next test took more then 10 sec to run. Consider to remove it from tier1: TEST: compiler/ccp/TestAndConZeroCCP.java build: 0.36 seconds compile: 0.327 seconds main: 17.989 seconds build: 0.0 seconds main: 19.864 seconds build: 0.0 seconds main: 12.58 seconds TEST RESULT: Passed. Execution successful ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2839543605 From epeter at openjdk.org Tue Apr 29 16:44:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 16:44:53 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 16:55:15 GMT, Kangcheng Xu wrote: >> @tabjy Do you want us to review again? We generally wait for a ping, otherwise we assume you are still working on it ;) > > Hello @eme64. I pinged you in [an in-line review](https://github.com/openjdk/jdk/pull/23506#discussion_r2042974649). Could you please provide some commons on this assertion? This is currently blocking my progress and breaking the build. Thank you very much! @tabjy Thanks for your patience, this one took me longer than I wanted. I responded like this above: > Hmm, ok I see. Why don't you remove the asserts for now, and we see how clear the code looks now. I think I asked for the consistency check because I was confused by the previous code structure. Maybe it is ok now as it is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2839557998 From epeter at openjdk.org Tue Apr 29 16:44:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Apr 2025 16:44:53 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: <4AmJHDIvXVOmlND7BlncW1q2KZGYs3IQyGTXdua1Qqc=.65c53ed6-4c18-4aeb-85b0-fa5a5de77a0a@github.com> On Mon, 14 Apr 2025 21:32:47 GMT, Kangcheng Xu wrote: >> Added `assert()` to `Mul[IL]Node::Ideal()` > > After experiment for a while, I found it not practical to assert power-of-2 patterns in `Ideal()` (or anywhere else). > > https://github.com/openjdk/jdk/blob/e9b42dcce268f32bd2ec3c01ac6221073b888538/src/hotspot/share/opto/mulnode.cpp#L263-L266 > > First, `phase->transform()` could transform `LShiftINode`s into something very different. Second, there is no guarantee the `res = AddINode` will remain untransformed by the time it's passed into `find_power_of_two_addition_pattern()`. Asserting power-of-2 patterns before such transformation is not very helpful. > >> @eme64: [...] check that we are only generating patterns that find_power_of_two_addition_pattern recognizes > > In short, we can't guarantee we'll always pick up *transformed* power-of-2 patterns, at least not with significant effort and difficulty predicting all possible transformations. However, I would argue making this guarantee is outside the scope of this issue as all serial addition patterns are correctly picked up and optimized right now. > > I'm not sure how to proceed with this. Please let me know what you think. Thanks! Hmm, ok I see. Why don't you remove the asserts for now, and we see how clear the code looks now. I think I asked for the consistency check because I was confused by the previous code structure. Maybe it is ok now as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2066951920 From sviswanathan at openjdk.org Tue Apr 29 18:31:00 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 29 Apr 2025 18:31:00 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 20:04:10 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > cleanup ecmov, eorw and other refactoring src/hotspot/cpu/x86/assembler_x86.cpp line 1819: > 1817: void Assembler::ecmovl(Condition cc, Register dst, Register src1, Register src2) { > 1818: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1819: evex_opcode_and_int16_ndd(dst->encoding(), src1->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, 0x40 | cc, 0xC0, false); This could be instead call to: evex_opcode_int16_ndd(dst->encoding(), src2->encoding(), src1->encoding(), ..) We can remove the separate method evex_opcode_and_int16_ndd() method altogether. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2067108373 From jbhateja at openjdk.org Tue Apr 29 18:52:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Apr 2025 18:52:23 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Message-ID: This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. Please review and share your feedback. Best Regards, Jatin [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. ------------- Commit messages: - 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24919/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355364 Stats: 18 lines in 4 files changed: 7 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919 PR: https://git.openjdk.org/jdk/pull/24919 From jbhateja at openjdk.org Tue Apr 29 18:52:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Apr 2025 18:52:23 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. Please refer to following comments in relocInfo, which warns against recording relocation against exact patch site as it may pose problems in querying / iterating over relocations corresponding to particular instruction starting address. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L85 @TobiHartmann confirmed that the patch fixed crashes. https://bugs.openjdk.org/browse/JDK-8355363#:~:text=Sounds%20reasonable%2C%20maybe%20mention%20that%20in%20the%20PR%20as%20well.%20All%20testing%20passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2839879447 From sparasa at openjdk.org Tue Apr 29 18:58:49 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Apr 2025 18:58:49 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 18:23:30 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup ecmov, eorw and other refactoring > > src/hotspot/cpu/x86/assembler_x86.cpp line 1819: > >> 1817: void Assembler::ecmovl(Condition cc, Register dst, Register src1, Register src2) { >> 1818: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 1819: evex_opcode_and_int16_ndd(dst->encoding(), src1->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, 0x40 | cc, 0xC0, false); > > This could be instead call to: > evex_opcode_int16_ndd(dst->encoding(), src2->encoding(), src1->encoding(), ..) > We can remove the separate method evex_opcode_and_int16_ndd() method altogether. Actually, I tried implementing using only one function but due to the argument ordering changes in the inner functions, it was not possible to do it using one high level function like `evex_opcode_int16_ndd`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2067156847 From vlivanov at openjdk.org Tue Apr 29 19:17:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 29 Apr 2025 19:17:53 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: <6GKM--4QaU2R3zwcgKb-zueVIrKX9MvYHsE-95HDHYI=.350ce1e8-0190-4d09-b35c-b37ab66eb883@github.com> On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker Looks good. I'll submit it for testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2804718558 From qamai at openjdk.org Tue Apr 29 19:34:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 29 Apr 2025 19:34:48 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: <17WJx_sXIF4A7rrZmzOLuJ4WjyvTNm957aJ35MG2XLU=.063b8bf5-8626-4113-b8f5-814a3b314d47@github.com> On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding `nop` to the fast path, especially `uncolor` is used in the load fast path IIUC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840022636 From coleenp at openjdk.org Tue Apr 29 19:39:52 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 29 Apr 2025 19:39:52 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker src/hotspot/share/runtime/unloadableMethodHandle.hpp line 26: > 24: > 25: #ifndef SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP > 26: #define SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP I think this should be in the oops directory like OopHandle and WeakHandle and Method*, the thing it contains. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2067225946 From dlong at openjdk.org Tue Apr 29 20:58:56 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Apr 2025 20:58:56 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 29 Apr 2025 10:30:24 GMT, Doug Simon wrote: >> The cast was added by 8331087, which reduced the supported JVMCI data size to uint16_t. I don't remember this issue with long names coming up during that review, so I guess we all missed it. @dougxc please file a bug so we can track this. It seems like JVMCINMethodData::copy should do something like truncate long names rather than blindly assuming it has enough space. If uint16_t is unreasonably small for JVMCI nmethod data we could revert that change and make it 32 bits again. > > I think in 8331087, I think only `_jvmci_data_offset` was subject to the [narrowing cast](https://github.com/openjdk/jdk/pull/18984/files#diff-c345a29edc19eb49f833e007143e00f897bb762b0f451a202e1d2f5304ed2308R1496). > I've opened https://bugs.openjdk.org/browse/JDK-8355896. Yes, my mistake. I was thinking `_jvmci_data_offset` was used to compute `jvmci_data_end()`, not `jvmci_data_begin()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r2067417832 From kvn at openjdk.org Tue Apr 29 21:20:03 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Apr 2025 21:20:03 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 29 Apr 2025 20:55:40 GMT, Dean Long wrote: >> I think in 8331087, I think only `_jvmci_data_offset` was subject to the [narrowing cast](https://github.com/openjdk/jdk/pull/18984/files#diff-c345a29edc19eb49f833e007143e00f897bb762b0f451a202e1d2f5304ed2308R1496). >> I've opened https://bugs.openjdk.org/browse/JDK-8355896. > > Yes, my mistake. I was thinking `_jvmci_data_offset` was used to compute `jvmci_data_end()`, not `jvmci_data_begin()`. Yes we should use 32 bits. Even if we revert back to using _jvmci_data_offset we can **NOT** use uint16_t because size of relocation (after which JVMCI data is placed) data is bigger. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r2067442878 From vlivanov at openjdk.org Tue Apr 29 21:48:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 29 Apr 2025 21:48:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: <8whv0N23B1N6GRZl7ASNlvvRObm0Y7RVWldnaRIXplo=.39171c24-7b80-4721-8ef5-d5c55affddab@github.com> On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker Testing results (hs-tier1 - hs-tier4) look good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2805157762 From vlivanov at openjdk.org Tue Apr 29 21:59:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 29 Apr 2025 21:59:46 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> References: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> Message-ID: On Tue, 29 Apr 2025 12:47:25 GMT, Aleksey Shipilev wrote: >> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. >> >> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. >> >> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fiddle with locks Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24933#pullrequestreview-2805173763 From dlong at openjdk.org Tue Apr 29 22:38:50 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Apr 2025 22:38:50 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. An alternative fix would be to change CompiledDirectCall::find_stub_for() so that it ignores relocInfo::barrier_type. Adding a nop for ZBarrierRelocationFormatLoadGoodAfterShX but not other relocations, like ZBarrierRelocationFormatStoreGoodAfterOr, seems less robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840387447 From iklam at openjdk.org Wed Apr 30 01:17:57 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 30 Apr 2025 01:17:57 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v11] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 15:29:08 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused method LGTM. Some small nits. src/hotspot/share/cds/filemap.cpp line 1296: > 1294: mapped_base = requested_base; > 1295: } else { > 1296: bool read_only = false, allow_exec = false; Add comment for clarity: // We do not execute in-place in the AOT code region. AOT code is copied to // the CodeCache for execution. bool read_only = false, allow_exec = false; src/hotspot/share/include/cds.h line 38: > 36: // Also, this is a C header file. Do not use C++ here. > 37: > 38: #define NUM_CDS_REGIONS 5 // this must be the same as MetaspaceShared::n_regions Change `CURRENT_CDS_ARCHIVE_VERSION` below to `19` ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2805483199 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2067701064 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2067702057 From duke at openjdk.org Wed Apr 30 01:26:51 2025 From: duke at openjdk.org (erifan) Date: Wed, 30 Apr 2025 01:26:51 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: <1Qo4pB9I7Ok4ntXSE-KkE0sv-Tp5EVCWriWnjcf2iEE=.a7e28640-85df-436a-9c82-3c067cc88dee@github.com> References: <1Qo4pB9I7Ok4ntXSE-KkE0sv-Tp5EVCWriWnjcf2iEE=.a7e28640-85df-436a-9c82-3c067cc88dee@github.com> Message-ID: On Tue, 29 Apr 2025 10:22:22 GMT, Emanuel Peter wrote: > Yes, this discussion is down to `requires` vs `applyIf`. This is my argument for `applyIf`, quoted from above, I have not yet seen an argument against it: > > > If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. > > In my understanding, `requires` should only be used if the test really **requires** a certain platform or feature. That can be because some flags are only available under certain platforms for example. But for IR tests, we should try to always use `applyIf`, because it allows testing on other platforms. > > Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891 We should try to move as many tests from using `requires` to `applyIf`, so that we have an increased test coverage. I see, I'll update the code. Thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2840582699 From jbhateja at openjdk.org Wed Apr 30 01:46:49 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 30 Apr 2025 01:46:49 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: <17WJx_sXIF4A7rrZmzOLuJ4WjyvTNm957aJ35MG2XLU=.063b8bf5-8626-4113-b8f5-814a3b314d47@github.com> References: <17WJx_sXIF4A7rrZmzOLuJ4WjyvTNm957aJ35MG2XLU=.063b8bf5-8626-4113-b8f5-814a3b314d47@github.com> Message-ID: <1_USVhqRqOqwC7RkPEUyIJ2Mew529yUNVEA-hTcNJY4=.dbe3355f-9b1a-48a7-99ae-ee56760ae9f3@github.com> On Tue, 29 Apr 2025 19:31:46 GMT, Quan Anh Mai wrote: > I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding `nop` to the fast path, especially `uncolor` is used in the load fast path IIUC. Thanks for supporting this idea, specializing barrier relocation is an alternative we already discussed, but it may not be able shield against false mapping with subsequent relocatable instruction which is what is causing crash currently. https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2025-April/088895.html @dean-long's suggestion to map a relocation to exact patch site was a fullproof solution overcome any such limitation, but it may pose problems while querying/iterating over relocation set against starting address of instruction and is a bigger change which we plan to address with after evaluation and considering alternative scheme https://bugs.openjdk.org/browse/JDK-8355341 Current scheme of adding relocation from end of instruction is not robust either to prevent incorrect mapping with subsequent relocatable instruction, NOP is not dispatched to execution unit by add additional byte to code cache. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840602447 From duke at openjdk.org Wed Apr 30 01:47:55 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 01:47:55 GMT Subject: RFR: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 [v2] In-Reply-To: References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: On Tue, 29 Apr 2025 03:06:35 GMT, Anjian-Wen wrote: >> One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format @Anjian-Wen Your change (at version 66e24b67d2bb0cbe1054cee3ae7bbb5057b4cdf3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24918#issuecomment-2840603574 From duke at openjdk.org Wed Apr 30 02:01:49 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 30 Apr 2025 02:01:49 GMT Subject: Integrated: 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 In-Reply-To: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> References: <3UYqsbAUEohLo8ubL7wccwEV5UrcX4wqYbC6e6gMOpE=.075bd026-616a-4793-8fcb-ea4f70cf237c@github.com> Message-ID: On Mon, 28 Apr 2025 11:17:19 GMT, Anjian-Wen wrote: > One or more IR match rules failed in AllBitsSetVectorMatchRuleTest. For the IR testing requirements, the matching rules are split by type. After fix, the test on fastdebug can passed! This pull request has now been integrated. Changeset: 375ac6d4 Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/375ac6d446332f0763ce294b200143ff63865cf6 Stats: 136 lines in 1 file changed: 104 ins; 12 del; 20 mod 8355796: RISC-V: compiler/vectorapi/AllBitsSetVectorMatchRuleTest.java fails after JDK-8355657 Reviewed-by: fyang, gcao ------------- PR: https://git.openjdk.org/jdk/pull/24918 From kvn at openjdk.org Wed Apr 30 02:05:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 02:05:41 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: address Ioi's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/b2466e6e..78f2828d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=10-11 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From kvn at openjdk.org Wed Apr 30 02:05:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 02:05:42 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 01:15:21 GMT, Ioi Lam wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused method > > LGTM. Some small nits. Thank you, @iklam, for review. I pushed your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2840619267 From qamai at openjdk.org Wed Apr 30 02:31:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 02:31:44 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840648169 From iklam at openjdk.org Wed Apr 30 02:45:48 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 30 Apr 2025 02:45:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 02:05:41 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address Ioi's comments Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2805579280 From iveresov at openjdk.org Wed Apr 30 05:39:04 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 05:39:04 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v5] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Address review comments part 2 - Merge branch 'master' into pp2 - Merge branch 'master' into pp2 - Fix class filtering - Remove the workaround of setting AOTRecordTraining during assembly - Address some of the review comments - Merge branch 'master' into pp - Add AOTCompileEagerly flag to control compilation after clinit - Port 8355334: [leyden] Missing type profile info in archived training data - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation - ... and 24 more: https://git.openjdk.org/jdk/compare/6850757f...4514d032 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=04 Stats: 3197 lines in 57 files changed: 2972 ins; 103 del; 122 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iveresov at openjdk.org Wed Apr 30 05:39:04 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 05:39:04 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v5] In-Reply-To: References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: <2b3hUB0FEH3l0nU-C6rkeihJwHawxzfkHI3UVbGfO88=.dc289b11-8c83-4473-96fb-40381c986c59@github.com> On Sun, 27 Apr 2025 01:15:54 GMT, Vladimir Kozlov wrote: >> You mean you want these checks to be done only if `TrainingData::have_data() == true` ? > > Yes, if it is related. Otherwise you may change default behavior when Leyden code is not used. I simplified it and factored out the new semantics. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2067883564 From fyang at openjdk.org Wed Apr 30 05:48:50 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 30 Apr 2025 05:48:50 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Tue, 29 Apr 2025 13:42:22 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously, BasicFloatOpTest.java is accidently not really enabled on riscv. > And FmaVF/FmaVD depends on both UseFMA and UseRVV, the code should make it clear in this sense. > And IR verification of FmaVF in BasicFloatOpTest.java should only be enabled when UseFMA && rvv. > > Thanks! test/hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java line 35: > 33: * @run main/othervm -Xbootclasspath/a:. > 34: * -XX:+UnlockDiagnosticVMOptions > 35: * -XX:+WhiteBoxAPI This test also covers x86 avx and aarch64 asimd targets. I think UseFMA is also needed for IR checks for these platforms, right? Seems to me more reasonable to add a `-XX:+UseFMA` VM option when running this test. Then we won't need to add all these `applyIf = {"UseFMA", "true"}`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24950#discussion_r2067889769 From duke at openjdk.org Wed Apr 30 06:36:30 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 30 Apr 2025 06:36:30 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > `Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - delete useless label - fix label jump - Merge branch 'openjdk:master' into temp_test_unsafe - fix address alignment at tail - modify loop logic to generate_fill and add unsafe mark - RISC-V: Intrinsify Unsafe::setMemory ------------- Changes: https://git.openjdk.org/jdk/pull/23890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=02 Stats: 116 lines in 1 file changed: 116 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Wed Apr 30 06:36:31 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 30 Apr 2025 06:36:31 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v2] In-Reply-To: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> References: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> Message-ID: On Fri, 28 Mar 2025 11:24:43 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> `Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > ... > > Anjian-Wen has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. added the UnsafeMemoryAccessMark and modified the format added the jmh benchmark ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2830172176 From mchevalier at openjdk.org Wed Apr 30 07:18:46 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 30 Apr 2025 07:18:46 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment Thanks @dean-long and @eme64 for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841038128 From shade at openjdk.org Wed Apr 30 07:23:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Apr 2025 07:23:39 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Move to oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/9f44cb5c..baea6cde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=09-10 Stats: 11 lines in 5 files changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Wed Apr 30 07:23:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Apr 2025 07:23:41 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: <-ZNfojmvOVBb11JmAr_91o6CnxXMnv2DLe82gbZNwEs=.45c66691-575a-48db-b1d6-f1c5611c6ea3@github.com> On Tue, 29 Apr 2025 19:37:14 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve get_method_blocker > > src/hotspot/share/runtime/unloadableMethodHandle.hpp line 26: > >> 24: >> 25: #ifndef SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP >> 26: #define SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP > > I think this should be in the oops directory like OopHandle and WeakHandle and Method*, the thing it contains. Good call, moved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2068035888 From gcao at openjdk.org Wed Apr 30 07:30:45 2025 From: gcao at openjdk.org (Gui Cao) Date: Wed, 30 Apr 2025 07:30:45 GMT Subject: RFR: 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 07:17:01 GMT, Gui Cao wrote: > Hi, when I use the qemu-system mode, the jdk/incubator/vector/DoubleMaxVectorTests.java test run fails. > > As discussed on JBS, As discussed on jbs, the SIGILL instruction is vmv1r.v v1,v3. > > I found the cause of the problem on the qemu source code, qemu Add vill check for whole vector register move instructions [1]. > > The ratified version of RISC-V V spec section 16.6 says that `The instructions operate as if EEW=SEW`. > > So the whole vector register move instructions depend on the vtype > > We need to add vsetvli before move vector register. In order not to have a misunderstanding here, maybe Vector Whole Vector Register Move without setting vsetvli, I replaced vmv1r_v with vmv_v_v. > > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20231129170400.21251-2-max.chou at sifive.com/#25621160 > > ### Testing > > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24943#issuecomment-2841061921 From duke at openjdk.org Wed Apr 30 07:30:45 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 07:30:45 GMT Subject: RFR: 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 07:17:01 GMT, Gui Cao wrote: > Hi, when I use the qemu-system mode, the jdk/incubator/vector/DoubleMaxVectorTests.java test run fails. > > As discussed on JBS, As discussed on jbs, the SIGILL instruction is vmv1r.v v1,v3. > > I found the cause of the problem on the qemu source code, qemu Add vill check for whole vector register move instructions [1]. > > The ratified version of RISC-V V spec section 16.6 says that `The instructions operate as if EEW=SEW`. > > So the whole vector register move instructions depend on the vtype > > We need to add vsetvli before move vector register. In order not to have a misunderstanding here, maybe Vector Whole Vector Register Move without setting vsetvli, I replaced vmv1r_v with vmv_v_v. > > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20231129170400.21251-2-max.chou at sifive.com/#25621160 > > ### Testing > > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) @zifeihan Your change (at version 400865fab31de8ac1bc8c540fefea9a9d048aff7) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24943#issuecomment-2841064033 From mli at openjdk.org Wed Apr 30 08:18:45 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 08:18:45 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Wed, 30 Apr 2025 05:43:17 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this simple patch? >> Previously, BasicFloatOpTest.java is accidently not really enabled on riscv. >> And FmaVF/FmaVD depends on both UseFMA and UseRVV, the code should make it clear in this sense. >> And IR verification of FmaVF in BasicFloatOpTest.java should only be enabled when UseFMA && rvv. >> >> Thanks! > > test/hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java line 35: > >> 33: * @run main/othervm -Xbootclasspath/a:. >> 34: * -XX:+UnlockDiagnosticVMOptions >> 35: * -XX:+WhiteBoxAPI > > This test also covers x86 avx and aarch64 asimd targets. I think UseFMA is also needed for IR checks for these platforms, right? Seems to me more reasonable to add a `-XX:+UseFMA` VM option when running this test. Then we won't need to add all these `applyIf = {"UseFMA", "true"}`. I suggest we don't touch other platforms unless I have to do so (e.g. in https://github.com/openjdk/jdk/pull/24947, I have to touch an existing test in another platform), because we need to test it, and it could introduce potential failures, so I won't risk for it. For `-XX:+UseFMA`, I suggest we don't add it to the whole test, as there are other IR verifications, which does not depend on it. How do you think about it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24950#discussion_r2068136856 From epeter at openjdk.org Wed Apr 30 08:24:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 08:24:46 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24824#pullrequestreview-2806199875 From epeter at openjdk.org Wed Apr 30 08:24:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 08:24:47 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: <4YWU-iKW-7glvGtW9ptvPBYzM0jCDraxY710vwi-TEU=.2a9979c3-beb9-40bf-bfe3-04a586265ebf@github.com> On Wed, 30 Apr 2025 07:15:42 GMT, Marc Chevalier wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> +comment > > Thanks @dean-long and @eme64 for reviews! @marc-chevalier seems the bot is struggling to mark it as ready... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841193043 From epeter at openjdk.org Wed Apr 30 08:24:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 08:24:48 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <43xT9PWVHBKJVQ0fgZ1-kJIlkahhZgS9bxUiQL8z2xw=.3fca6b8f-40fd-41cb-84cf-0661303b25b1@github.com> References: <43xT9PWVHBKJVQ0fgZ1-kJIlkahhZgS9bxUiQL8z2xw=.3fca6b8f-40fd-41cb-84cf-0661303b25b1@github.com> Message-ID: <6YUaUJrC8R-3bmA109PdMSF4K1URsaBB0BnzXkAMCb8=.7ba9250c-4fd6-4b8a-9f71-bdfc1b7022b0@github.com> On Thu, 24 Apr 2025 21:36:17 GMT, Dean Long wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> +comment > > Marked as reviewed by dlong (Reviewer). @dean-long Maybe the issue is that we have 2 reviewers required, and so the last commit must be approved by all 2 reviewers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841196954 From mchevalier at openjdk.org Wed Apr 30 08:37:46 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 30 Apr 2025 08:37:46 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: <9cEyGJgyV4_f9kfxYc_Cx-mlKJ5hnzMv_V-0Ng3Ma94=.e57b93af-73c0-46e8-a54e-162f11cd629e@github.com> On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment Yes, that makes sense. Well, no rush, no problem! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841231049 From epeter at openjdk.org Wed Apr 30 08:44:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 08:44:48 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment I'm just going to lower the requirement. All we changed since @dean-long 's last review is an additional comment, and I'll assume that he would not object to that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841242504 From mchevalier at openjdk.org Wed Apr 30 08:44:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 30 Apr 2025 08:44:48 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment let's try again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841247275 From duke at openjdk.org Wed Apr 30 08:44:48 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 08:44:48 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment @marc-chevalier Your change (at version 125d9c12bf8d41840ad99e64a25d42dd57f2f9c7) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2841249497 From mchevalier at openjdk.org Wed Apr 30 08:48:54 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 30 Apr 2025 08:48:54 GMT Subject: Integrated: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 10:58:54 GMT, Marc Chevalier wrote: > The double `(double)count * prof_factor * method_life / counter_life + 0.5` > can overflow a 32-bit int, causing UB on casting, but in practice computing > a wrong scale, probably. > > We just need to compare that the cast is not going to overflow. This is possible > because `INT_MAX` is exactly representable in a `double`. It is also good to > notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` > cannot overflow a `double`: > - `count` is a int, max value = 2^31-1 < 2.2e9 > - `method_lie` is a int, max value < 2.2e9 > - `prof_factor` is a float, max value < 3.5e38 > - `counter_life` is a int, positive at this point, so min value = 1 > So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the > max value of a double (about 1.8e308). We probably would have precision issues, but > it probably doesn't matter a lot. > > The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. > > Thanks, > Marc This pull request has now been integrated. Changeset: d802fd0d Author: Marc Chevalier Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d802fd0da234275c79b67f74f2cfb15fbe18d7b9 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' Reviewed-by: epeter, dlong ------------- PR: https://git.openjdk.org/jdk/pull/24824 From mli at openjdk.org Wed Apr 30 08:50:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 08:50:48 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Wed, 30 Apr 2025 08:16:36 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java line 35: >> >>> 33: * @run main/othervm -Xbootclasspath/a:. >>> 34: * -XX:+UnlockDiagnosticVMOptions >>> 35: * -XX:+WhiteBoxAPI >> >> This test also covers x86 avx and aarch64 asimd targets. I think UseFMA is also needed for IR checks for these platforms, right? Seems to me more reasonable to add a `-XX:+UseFMA` VM option when running this test. Then we won't need to add all these `applyIf = {"UseFMA", "true"}`. > > I suggest we don't touch other platforms unless I have to do so (e.g. in https://github.com/openjdk/jdk/pull/24947, I have to touch an existing test in another platform), because we need to test it, and it could introduce potential failures, so I won't risk for it. > > For `-XX:+UseFMA`, I suggest we don't add it to the whole test, as there are other IR verifications, which does not depend on it. > > How do you think about it? And, as you can see `"fma", "true", "avx", "true"`, it's doing the similar thing as riscv. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24950#discussion_r2068189823 From fyang at openjdk.org Wed Apr 30 09:03:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 30 Apr 2025 09:03:46 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Tue, 29 Apr 2025 13:42:22 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously, BasicFloatOpTest.java is accidently not really enabled on riscv. > And FmaVF/FmaVD depends on both UseFMA and UseRVV, the code should make it clear in this sense. > And IR verification of FmaVF in BasicFloatOpTest.java should only be enabled when UseFMA && rvv. > > Thanks! Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24950#pullrequestreview-2806317768 From fyang at openjdk.org Wed Apr 30 09:03:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 30 Apr 2025 09:03:46 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: <34G-9G1B1R5f8t7FlDDjN8pMFDrcA_nVp58BZdxPox4=.b57445b4-d4d0-41a6-91b3-f8b2362ccbd0@github.com> On Wed, 30 Apr 2025 08:48:18 GMT, Hamlin Li wrote: >> I suggest we don't touch other platforms unless I have to do so (e.g. in https://github.com/openjdk/jdk/pull/24947, I have to touch an existing test in another platform), because we need to test it, and it could introduce potential failures, so I won't risk for it. >> >> For `-XX:+UseFMA`, I suggest we don't add it to the whole test, as there are other IR verifications, which does not depend on it. >> >> How do you think about it? > > And, as you can see `"fma", "true", "avx", "true"`, it's doing the similar thing as riscv. > I suggest we don't touch other platforms unless I have to do so (e.g. in #24947, I have to touch an existing test in another platform), because we need to test it, and it could introduce potential failures, so I won't risk for it. OK if you want to keep this change RISC-V specific. > For `-XX:+UseFMA`, I suggest we don't add it to the whole test, as there are other IR verifications, which does not depend on it. That make sense to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24950#discussion_r2068212508 From gcao at openjdk.org Wed Apr 30 09:07:51 2025 From: gcao at openjdk.org (Gui Cao) Date: Wed, 30 Apr 2025 09:07:51 GMT Subject: Integrated: 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 07:17:01 GMT, Gui Cao wrote: > Hi, when I use the qemu-system mode, the jdk/incubator/vector/DoubleMaxVectorTests.java test run fails. > > As discussed on JBS, As discussed on jbs, the SIGILL instruction is vmv1r.v v1,v3. > > I found the cause of the problem on the qemu source code, qemu Add vill check for whole vector register move instructions [1]. > > The ratified version of RISC-V V spec section 16.6 says that `The instructions operate as if EEW=SEW`. > > So the whole vector register move instructions depend on the vtype > > We need to add vsetvli before move vector register. In order not to have a misunderstanding here, maybe Vector Whole Vector Register Move without setting vsetvli, I replaced vmv1r_v with vmv_v_v. > > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20231129170400.21251-2-max.chou at sifive.com/#25621160 > > ### Testing > > qemu-system 9.1.50 with UseRVV: > > - [x] Run test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: 765cef45 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/765cef45465806e53f11fa7d92b9c184899b0932 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8355878: RISC-V: jdk/incubator/vector/DoubleMaxVectorTests.java fails when using RVV Reviewed-by: fyang, dzhang ------------- PR: https://git.openjdk.org/jdk/pull/24943 From mli at openjdk.org Wed Apr 30 09:13:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 09:13:58 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: <34G-9G1B1R5f8t7FlDDjN8pMFDrcA_nVp58BZdxPox4=.b57445b4-d4d0-41a6-91b3-f8b2362ccbd0@github.com> References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> <34G-9G1B1R5f8t7FlDDjN8pMFDrcA_nVp58BZdxPox4=.b57445b4-d4d0-41a6-91b3-f8b2362ccbd0@github.com> Message-ID: On Wed, 30 Apr 2025 09:00:59 GMT, Fei Yang wrote: >> And, as you can see `"fma", "true", "avx", "true"`, it's doing the similar thing as riscv. > >> I suggest we don't touch other platforms unless I have to do so (e.g. in #24947, I have to touch an existing test in another platform), because we need to test it, and it could introduce potential failures, so I won't risk for it. > > OK if you want to keep this change RISC-V specific. > >> For `-XX:+UseFMA`, I suggest we don't add it to the whole test, as there are other IR verifications, which does not depend on it. > > That make sense to me. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24950#discussion_r2068228787 From duke at openjdk.org Wed Apr 30 09:14:48 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 30 Apr 2025 09:14:48 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 16:18:45 GMT, Emanuel Peter wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > @sarannat The fix looks reasonable to me, and looking forward to the follow-up JDK-8325478 :) > > It looks like the JBS issue has some `Reduced.java`, which is from a fuzzer `Test.java`, correct? Can you attach that one as a JTREG test here? Or is there already a very good and minimal reproducer in the repository? @eme64 : I will attach a version of the `Reduced.java` to the JTREG test in the next commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24890#issuecomment-2841323530 From dlunden at openjdk.org Wed Apr 30 09:51:45 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Apr 2025 09:51:45 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: <36IC7thNmhW7_AN1MsC4XWNGhBAmhlQweEylKPAENiA=.9df24353-0e3e-4092-8f59-7cf28cbbdeb7@github.com> On Tue, 29 Apr 2025 04:56:59 GMT, Galder Zamarre?o wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > src/hotspot/share/opto/gcm.cpp line 912: > >> 910: // they CAN write to Java memory. >> 911: if (muse->ideal_Opcode() == Op_CallStaticJava) { >> 912: assert(muse->is_MachSafePoint(), ""); > > I know there was not assert message before, but can we use the opportunity to add a meaningful message for this assert? There's another empty message assert a few lines before. Thanks for the comments @galderz! I do not know the specifics of this particular part of `insert_anti_dependences`. I could add generic assert messages, based on the checks, for the purpose of avoiding empty messages. But I'm not sure those are then meaningful messages. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2068332601 From dlunden at openjdk.org Wed Apr 30 10:08:46 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Apr 2025 10:08:46 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 04:58:39 GMT, Galder Zamarre?o wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > src/hotspot/share/opto/gcm.cpp line 889: > >> 887: // since the load will be forced into a block preceding the Phi. >> 888: pred_block->set_raise_LCA_mark(load_index); >> 889: assert(!LCA_orig->dominates(pred_block) || > > Has this assert moved elsewhere? Or do we really want to remove it altogether? Thanks for bringing this up; I was meaning to write about this particular removal (and a similar removal towards the end of `insert_anti_dependences`) in the PR description, but it slipped my mind. The first removed `assert` is the following. assert(!LCA_orig->dominates(pred_block) || early->dominates(pred_block), "early is high enough"); This `assert` is equivalent to the implication "LCA_orig dominates pred_block" ? "early dominates pred_block". We can assert something much stronger, namely that early dominates LCA_orig. This, by definition, entails that "LCA_orig dominates b" ? "early dominates b" for _any_ block b. I added such an assert (see below) early in `insert_anti_dependences` and removed the current assert. assert(early->dominates(LCA_orig), "precondition failed"); The situation is analogous for the other `assert` that I removed. assert(LCA->dominates(store_block) || !LCA_orig->dominates(store_block), "no stray stores"); I simply replaced it with another assert at the end if `insert_anti_dependences`: assert(LCA->dominates(LCA_orig), "unsound updated LCA"); Furthermore, we also already have an assert in `raise_LCA_above_marks` that verifies the new LCA everytime we update it. assert(early->dominates(LCA), "unsound LCA update"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2068360455 From dlunden at openjdk.org Wed Apr 30 10:12:48 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Apr 2025 10:12:48 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> On Tue, 29 Apr 2025 05:01:16 GMT, Galder Zamarre?o wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > test/hotspot/jtreg/compiler/loopopts/TestSplitIfPinnedLoadInStripMinedLoop.java line 141: > >> 139: >> 140: // Same as test2 but with reference to inner loop induction variable 'j' and different order of instructions. >> 141: // Triggers an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification is disabled: > > Is this test still valid? According to the comment it should trigger an assert but this assert appears to be removed? Is the test correct if the test is passing even though the assert has been removed? See my above comment on the removal of this assertion. Ah, good catch. Let me try to verify that my new asserts also trigger if I revert the fix for [JDK-8260420](https://bugs.openjdk.org/browse/JDK-8260420). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2068365753 From dlunden at openjdk.org Wed Apr 30 10:17:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Apr 2025 10:17:34 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: Message-ID: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Updates after reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24926/files - new: https://git.openjdk.org/jdk/pull/24926/files/eceb18e8..3d11b554 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From dlunden at openjdk.org Wed Apr 30 10:17:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Apr 2025 10:17:34 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 11:56:25 GMT, Hendrik Schick wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after reviews > > src/hotspot/share/opto/gcm.cpp line 674: > >> 672: // >> 673: // 1. raise the load's LCA to force the load to (eventually) be scheduled at >> 674: // latest in the stores's block, and > > Suggestion: > > // latest in the store's block, and Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 682: > >> 680: // path relative to the load if there are no paths from early to LCA that go >> 681: // through the store's block. Such stores are not anti-dependent, and there is >> 682: // no need to update the LCA nor to add anti-depencence edges. > > Suggestion: > > // no need to update the LCA nor to add anti-dependence edges. Thanks, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2068372597 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2068372685 From dlunden at openjdk.org Wed Apr 30 10:35:55 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Apr 2025 10:35:55 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) Message-ID: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. ### Changeset Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. ------------- Commit messages: - Fix issue Changes: https://git.openjdk.org/jdk/pull/24960/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24960&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354767 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24960/head:pull/24960 PR: https://git.openjdk.org/jdk/pull/24960 From galder at openjdk.org Wed Apr 30 10:54:56 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 30 Apr 2025 10:54:56 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for [v2] In-Reply-To: References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: On Tue, 29 Apr 2025 12:56:07 GMT, Manuel H?ssig wrote: >> ## Issue Summary >> >> The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. >> >> ## Change Summary >> >> To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. >> >> All changes of this PR summarized: >> - add a regression test, >> - update the relocation information after patching the method entry for making it not entrant, >> - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. >> >> ## Testing >> >> I ran tiers 1 through 3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into jdk-8258229-nmethod > - Add DeoptimizeALot and fix typo in test > - Hold NMethodState_lock while printing an nmethod > > This prevents data races on the relocation info when code is patched. > - Update relocation info when making method not entrant > - Add regression test Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24831#pullrequestreview-2806660889 From mchevalier at openjdk.org Wed Apr 30 11:31:47 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 30 Apr 2025 11:31:47 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc Rather than simply removing the test from tier1, can we make it faster? (spoiler: yes). There are two big slow things: - a `RepeatCompilation=300`: https://github.com/openjdk/jdk/blob/526951dba731f0e733e22a3bff7ac7a18ce9dece/test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java#L29-L30 we can remove this flag entirely. It's useful for debugging, but tier1 is run often, we probably don't need this flag. It is good for debugging, but I think we usually don't include such flags in tests. - a rather big loop: https://github.com/openjdk/jdk/blob/526951dba731f0e733e22a3bff7ac7a18ce9dece/test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java#L45-L47 calling another rather big loop: https://github.com/openjdk/jdk/blob/526951dba731f0e733e22a3bff7ac7a18ce9dece/test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java#L51-L55 that is overall 2^16 * 10^5 ~ 6.6e9 iterations. That's where most of the time is spent. But actually, the outer loop (with the `10000` bound) is not needed: I've reproduced the corresponding issue [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) replacing the loop by a mere `run()` and it still reproduces, and is working after the fix. The inner loop (with the `1 << 16` bound) cannot be shortened: the bound is necessary for having the correct range in some node types that triggers the bug. We also cannot make a much bigger step (`cp += 2` works, but not more), I'm not very sure why. With these improvements, this test is doing much better! Running this test alone in testing takes only 1h18 of machine time when it used to take 2h30. The longer duration of a single result passed from 40s to 8s. I think with these changes, the test is still meaningful and is short enough for tier1. I you agree, I can push this tiny fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2841681522 From rehn at openjdk.org Wed Apr 30 11:53:49 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 30 Apr 2025 11:53:49 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Tue, 29 Apr 2025 13:42:22 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously, BasicFloatOpTest.java is accidently not really enabled on riscv. > And FmaVF/FmaVD depends on both UseFMA and UseRVV, the code should make it clear in this sense. > And IR verification of FmaVF in BasicFloatOpTest.java should only be enabled when UseFMA && rvv. > > Thanks! Thanks, seems fine! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24950#pullrequestreview-2806793580 From rehn at openjdk.org Wed Apr 30 11:56:47 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 30 Apr 2025 11:56:47 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 12:47:25 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to enable TestIRFma.java? > FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. > > NOTE: the reason I change IRNode matching rules is that, previously it verify the `FINAL CODE` where every platform could have different instruct name; I change it from machOnlyNameRegex to beforeMatchingNameRegex, to make it verify the `PrintIdeal` where every platform share the same names. > > Also tested on machine with `asimd` support. > > Thanks! test/hotspot/jtreg/compiler/c2/irTests/TestIRFma.java line 89: > 87: > 88: @Test > 89: @IR(counts = {IRNode.FMSUB_F, "> 0", IRNode.NEG_F, "> 0"}, What is the reason for adding the "IRNode.NEG_F" ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24947#discussion_r2068508778 From rehn at openjdk.org Wed Apr 30 12:05:00 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 30 Apr 2025 12:05:00 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v3] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:27:50 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. > - 8322174: RISC-V: C2 VectorizedHashCode RVV Version Hey, certainly an improvement. Can you resolve the conflict by merging with master? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2841754229 From rcastanedalo at openjdk.org Wed Apr 30 12:10:00 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Apr 2025 12:10:00 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Document workaround for lldb issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24724/files - new: https://git.openjdk.org/jdk/pull/24724/files/8193767d..dd1ad6ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=03-04 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24724/head:pull/24724 PR: https://git.openjdk.org/jdk/pull/24724 From rcastanedalo at openjdk.org Wed Apr 30 12:13:48 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Apr 2025 12:13:48 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 12:10:00 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: >> >> - JVM arguments >> - platform information >> - JVM version information >> - date and time >> - process ID >> - (compiler) thread ID >> >> ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) >> >> Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: >> >> ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) >> >> This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): >> >> ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) >> >> The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: >> >> >> define igv >> p igv_print(true, $sp, $fp, $pc) >> end >> >> define igv_node >> p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) >> end >> >> >> Thanks to @TobiHartmann for providing useful feedback! >> >> #### Testing >> >> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). >> - Tested interactive usage manually via `gdb` and `rr` on linux-x64. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Document workaround for lldb issue Commit dd1ad6a documents how to dump the C2 stack trace when using `lldb` (default debugger on macOS platforms). Thanks @dafedafe for reporting and helping out! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24724#issuecomment-2841775972 From dfenacci at openjdk.org Wed Apr 30 13:41:46 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 30 Apr 2025 13:41:46 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: <0X2LHkci-EGTDEG5RsRtrOXK7jLsSGUeoXgvFMeHqZA=.63571bd2-7130-44a0-9151-b054b34dbe6c@github.com> On Wed, 30 Apr 2025 12:10:00 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: >> >> - JVM arguments >> - platform information >> - JVM version information >> - date and time >> - process ID >> - (compiler) thread ID >> >> ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) >> >> Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: >> >> ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) >> >> This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): >> >> ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) >> >> The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: >> >> >> define igv >> p igv_print(true, $sp, $fp, $pc) >> end >> >> define igv_node >> p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) >> end >> >> >> Thanks to @TobiHartmann for providing useful feedback! >> >> #### Testing >> >> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). >> - Tested interactive usage manually via `gdb` and `rr` on linux-x64. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Document workaround for lldb issue Thanks a lot for adding this feature @robcasloz. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24724#pullrequestreview-2806913786 From dfenacci at openjdk.org Wed Apr 30 13:41:48 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 30 Apr 2025 13:41:48 GMT Subject: RFR: 8354520: IGV: dump contextual information [v4] In-Reply-To: <6vhZrRta9aAor4HaEOi2vpDXbJuZcEoJuP5sbjvekyA=.fba4a11e-d889-4537-8ce9-ea63fea359eb@github.com> References: <6vhZrRta9aAor4HaEOi2vpDXbJuZcEoJuP5sbjvekyA=.fba4a11e-d889-4537-8ce9-ea63fea359eb@github.com> Message-ID: On Thu, 24 Apr 2025 10:55:37 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: >> >> - JVM arguments >> - platform information >> - JVM version information >> - date and time >> - process ID >> - (compiler) thread ID >> >> ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) >> >> Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: >> >> ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) >> >> This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): >> >> ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) >> >> The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: >> >> >> define igv >> p igv_print(true, $sp, $fp, $pc) >> end >> >> define igv_node >> p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) >> end >> >> >> Thanks to @TobiHartmann for providing useful feedback! >> >> #### Testing >> >> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). >> - Tested interactive usage manually via `gdb` and `rr` on linux-x64. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add relative link to compile.cpp src/hotspot/share/opto/node.cpp line 2061: > 2059: Compile* C = Compile::current(); > 2060: C->init_igv(); > 2061: C->igv_print_graph_to_network(nullptr, _print_list, _frame); I suppose you removed "PrintBFS" to make the graph name be "Debug" like the other ones and make it easier to handle name and stack printing right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068579904 From epeter at openjdk.org Wed Apr 30 14:13:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 14:13:53 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: <4G5Po8SEYFxSylfIJtndUpu0LLboJPgGgmE8FL3t1S4=.39c5d519-7a78-43c7-a1fe-8cef72901490@github.com> Message-ID: On Tue, 8 Apr 2025 16:22:17 GMT, Galder Zamarre?o wrote: >> @turbanoff Thanks for the two whitespace fixes :) > > @eme64 No, I don't think my review counts that much on this one. I think you need someone to review it that is has more background on the history of this. @galderz Christian already reviewed it, and generally it is ok for someone with deeper knowledge and someone with a little less familiarity to review. So far, on `Verify.java` the only other commit is https://github.com/openjdk/jdk/pull/22715, so there is not too much history ;) But totally up to you, I'm thankful for the comments you already left, and they do not obligate to do a complete review :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2842122127 From rcastanedalo at openjdk.org Wed Apr 30 14:32:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Apr 2025 14:32:50 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: <5V9ebfuzDLssnpTsSlCsGIGKs71Ic_YZV9dO2F7J21c=.cb97f630-8784-48e0-94f1-de7c0262daea@github.com> On Wed, 30 Apr 2025 12:11:32 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Document workaround for lldb issue > > Commit dd1ad6a documents how to dump the C2 stack trace when using `lldb` (default debugger on macOS platforms). Thanks @dafedafe for reporting and helping out! > Thanks a lot for adding this feature @robcasloz. Thanks for reviewing, Damon! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24724#issuecomment-2842183627 From rcastanedalo at openjdk.org Wed Apr 30 14:32:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Apr 2025 14:32:51 GMT Subject: RFR: 8354520: IGV: dump contextual information [v4] In-Reply-To: References: <6vhZrRta9aAor4HaEOi2vpDXbJuZcEoJuP5sbjvekyA=.fba4a11e-d889-4537-8ce9-ea63fea359eb@github.com> Message-ID: On Wed, 30 Apr 2025 12:38:17 GMT, Damon Fenacci wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Add relative link to compile.cpp > > src/hotspot/share/opto/node.cpp line 2061: > >> 2059: Compile* C = Compile::current(); >> 2060: C->init_igv(); >> 2061: C->igv_print_graph_to_network(nullptr, _print_list, _frame); > > I suppose you removed "PrintBFS" to make the graph name be "Debug" like the other ones and make it easier to handle name and stack printing right? That's right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068787927 From mli at openjdk.org Wed Apr 30 14:45:49 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 14:45:49 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 11:51:55 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this patch to enable TestIRFma.java? >> FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. >> >> NOTE: the reason I change IRNode matching rules is that, previously it verify the `FINAL CODE` where every platform could have different instruct name; I change it from machOnlyNameRegex to beforeMatchingNameRegex, to make it verify the `PrintIdeal` where every platform share the same names. >> >> Also tested on machine with `asimd` support. >> >> Thanks! > > test/hotspot/jtreg/compiler/c2/irTests/TestIRFma.java line 89: > >> 87: >> 88: @Test >> 89: @IR(counts = {IRNode.FMSUB_F, "> 0", IRNode.NEG_F, "> 0"}, > > What is the reason for adding the "IRNode.NEG_F" ? I think I need to reconsider the test improvement, will update the pr later. Thank for having a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24947#discussion_r2068814573 From mli at openjdk.org Wed Apr 30 14:49:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 14:49:47 GMT Subject: RFR: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Wed, 30 Apr 2025 11:51:08 GMT, Robbin Ehn wrote: > Thanks, seems fine! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24950#issuecomment-2842239359 From mli at openjdk.org Wed Apr 30 14:52:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 14:52:23 GMT Subject: RFR: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX Message-ID: Hi, Can you help to review this simple improvement? By rvv spec, "integer compare instructions write 1 to the destination mask register element if the comparison evaluates to true, and 0 otherwise." "These vector FP compare instructions compare two source operands and write the comparison result to a mask register. " So, it's not always necessary to clear the mask register before vector comparison operation, e.g. when `vm != Assembler::v0_t`. Thanks! ------------- Commit messages: - fix - initial commit Changes: https://git.openjdk.org/jdk/pull/24968/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24968&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355980 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24968.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24968/head:pull/24968 PR: https://git.openjdk.org/jdk/pull/24968 From epeter at openjdk.org Wed Apr 30 14:57:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 14:57:56 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 12:10:00 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: >> >> - JVM arguments >> - platform information >> - JVM version information >> - date and time >> - process ID >> - (compiler) thread ID >> >> ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) >> >> Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: >> >> ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) >> >> This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): >> >> ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) >> >> The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: >> >> >> define igv >> p igv_print(true, $sp, $fp, $pc) >> end >> >> define igv_node >> p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) >> end >> >> >> Thanks to @TobiHartmann for providing useful feedback! >> >> #### Testing >> >> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). >> - Tested interactive usage manually via `gdb` and `rr` on linux-x64. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Document workaround for lldb issue Nice work @robcasloz ! I just left a few suggestions below, but I think they are basically all nits / optional :) src/hotspot/share/opto/compile.cpp line 5207: > 5205: > 5206: // Called from debugger. Prints method to the default file with the default phase name. > 5207: // This works regardless of any Ideal Graph Visualizer flags set or not. Suggestion: // This works regardless of any Ideal Graph Visualizer flags set or not. // Use in debugger (gdb / rr): p igv_print($sp, $fp, $pc) src/hotspot/share/opto/compile.cpp line 5222: > 5220: // the network flags for the Ideal Graph Visualizer, or to the default file depending on the 'network' argument. > 5221: // This works regardless of any Ideal Graph Visualizer flags set or not. > 5222: void igv_print(bool network, void* sp, void* fp, void* pc) { Suggestion: // Use in debugger (gdb / rr): p igv_print(true, $sp, $fp, $pc) void igv_print(bool network, void* sp, void* fp, void* pc) { src/hotspot/share/opto/compile.cpp line 5231: > 5229: } > 5230: > 5231: // Same as igv_print(bool network) above but with a specified phase name. Suggestion: // Same as igv_print(bool network, void* sp, void* fp, void* pc) above but with a specified phase name. // Use in debugger (gdb / rr): p igv_print(true, "MyPhase", $sp, $fp, $pc) src/hotspot/share/opto/compile.cpp line 5248: > 5246: // Called from debugger, especially when replaying a trace in which the program state cannot be altered like with rr replay. > 5247: // A method is appended to an existing default file with the default phase name. This means that igv_append() must follow > 5248: // an earlier igv_print(*) call which sets up the file. This works regardless of any Ideal Graph Visualizer flags set or not. Suggestion: // an earlier igv_print(*) call which sets up the file. This works regardless of any Ideal Graph Visualizer flags set or not. // Use in debugger (gdb / rr): p igv_append($sp, $fp, $pc) src/hotspot/share/opto/compile.cpp line 5254: > 5252: } > 5253: > 5254: // Same as igv_append() above but with a specified phase name. Suggestion: // Same as igv_append(void* sp, void* fp, void* pc) above but with a specified phase name. // Use in debugger (gdb / rr): p igv_append("MyPhase", $sp, $fp, $pc) src/hotspot/share/opto/idealGraphPrinter.cpp line 380: > 378: print_prop(COMPILATION_PROCESS_ID_PROPERTY, os::current_process_id()); > 379: print_prop(COMPILATION_THREAD_ID_PROPERTY, os::current_thread_id()); > 380: What about CPU features? Could be nice to know if we have `avx2` or `asimd`, etc. src/hotspot/share/opto/idealGraphPrinter.cpp line 907: > 905: } > 906: > 907: static bool skip_frame(const char* name) { Nit: the name suggests that this is "skipping a frame". But you are asking if we "should skip the frame". So I would recommend a name like `is_skip_frame`, `must_skip_frame`, `should_skip_frame` or alike. You could even invert the condition, and make the condition positive: `can_print_stack_frame`. Totally optional, up to you :) src/hotspot/share/opto/idealGraphPrinter.cpp line 917: > 915: static bool stop_frame_walk(const char* name) { > 916: return strstr(name, "C2Compiler::compile_method") != nullptr; > 917: } Nit: You could write `must_stop_frame_walk`. Btw, it could be nice to have a comment to explain why this is the condition to stop on. Maybe that comment could then turn into an even better method name? src/hotspot/share/opto/idealGraphPrinter.cpp line 921: > 919: void IdealGraphPrinter::print_stack(frame fr, outputStream* graph_name) { > 920: char buf[O_BUFLEN]; > 921: Thread* _current = Thread::current_or_null(); Is this a local variable? If so, I would drop the underscore. It generally suggests that it is a field, right? It would also not hurt to call it `current_thread` for clarity. src/hotspot/share/opto/idealGraphPrinter.cpp line 924: > 922: int count = 0; > 923: int frame = 0; > 924: while (count++ < StackPrintLimit && fr.pc() != nullptr) { Could this be formulated as a `for` loop? Suggestion: for (int count = 0; count < StackPrintLimit && fr.pc() != nullptr; count++) { Just an idea, totally optional. src/hotspot/share/opto/idealGraphPrinter.cpp line 967: > 965: if (!_current_method || !_should_send_method || node == nullptr) return; > 966: > 967: frame current = fr == nullptr ? os::current_frame() : *fr; Suggestion: frame current_frame = fr == nullptr ? os::current_frame() : *fr; Just to make sure there is no confusion with `_current_method`. Optional, up to you. src/hotspot/share/opto/idealGraphPrinter.hpp line 122: > 120: // graph_name == nullptr) or the graph name based on the highest C2 frame (if > 121: // graph_name != nullptr). > 122: void print_stack(frame fr, outputStream* graph_name); Are you passing this `frame` by value on purpose? src/hotspot/share/opto/node.cpp line 1799: > 1797: const char* _options; > 1798: outputStream* _output; > 1799: frame* _frame; Could this be `const`, like the other pointers? src/hotspot/share/opto/node.cpp line 2428: > 2426: } > 2427: > 2428: // Call this from debugger, with stack handling register arguments for IGV dumps. Suggestion: // Call this from debugger, with stack handling register arguments for IGV dumps. // Example: p find_node(741)->dump_bfs(7, find_node(741), "c+A!", $sp, $fp, $pc) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24724#pullrequestreview-2807226660 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068758515 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068761567 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068763703 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068766019 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068767093 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068770939 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068779813 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068784104 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068787022 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068793710 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068804402 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068817524 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068840383 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068846400 From epeter at openjdk.org Wed Apr 30 14:57:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 14:57:57 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 14:39:15 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Document workaround for lldb issue > > src/hotspot/share/opto/idealGraphPrinter.cpp line 967: > >> 965: if (!_current_method || !_should_send_method || node == nullptr) return; >> 966: >> 967: frame current = fr == nullptr ? os::current_frame() : *fr; > > Suggestion: > > frame current_frame = fr == nullptr ? os::current_frame() : *fr; > > Just to make sure there is no confusion with `_current_method`. > Optional, up to you. Also: what is the implication of having the `frame` object on the stack, rather than a pointer to it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068812834 From epeter at openjdk.org Wed Apr 30 14:57:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 14:57:57 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 14:42:30 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/idealGraphPrinter.cpp line 967: >> >>> 965: if (!_current_method || !_should_send_method || node == nullptr) return; >>> 966: >>> 967: frame current = fr == nullptr ? os::current_frame() : *fr; >> >> Suggestion: >> >> frame current_frame = fr == nullptr ? os::current_frame() : *fr; >> >> Just to make sure there is no confusion with `_current_method`. >> Optional, up to you. > > Also: what is the implication of having the `frame` object on the stack, rather than a pointer to it? Well, never mind, I see it all over the runtime code base. Seems ok, though I'm a little uneasy about it. But not your problem, I guess ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2068832679 From epeter at openjdk.org Wed Apr 30 15:11:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 15:11:07 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: References: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> Message-ID: On Tue, 22 Apr 2025 16:38:31 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 84: >> >>> 82: // Find the minimum value that is not less than lo and satisfies bits. If there >>> 83: // does not exist one such number, the calculation will overflow and return a >>> 84: // value < lo. >> >> I'm wondering if we should say anything more specific for this case. Maybe at least an example? It should probably not go here at the beginning, but somewhere further down. > > What do you mean? It is the doc for `adjust_lo`, of course it needs to be here, I think it is logical to say what the function does before saying how it does it. For that purpose I think it is very clear already and an example is not needed. @merykitty Hmm, I'm a little nervous about the case where `there does not exist one such number`. Because all your proof does is basically assume that there is such a `r`, and then shows that we compute it correctly. But such proofs do not give us the guarantee that if there is no such `r`, that the computation indeed overflows, i.e. produces a number smaller than `lo`. That would be required for the correctness, no? So I guess the proof / examples with overflow should happen further down, I'm just mentioning it up here because it is here that you say there can be an overflow. Hope that makes sense ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2068879473 From dzhang at openjdk.org Wed Apr 30 15:26:43 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 30 Apr 2025 15:26:43 GMT Subject: RFR: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 14:46:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > > By rvv spec, > "integer compare instructions write 1 to the destination mask register element if the comparison evaluates to true, and 0 otherwise." > "These vector FP compare instructions compare two source operands and write the comparison result to a mask register. " > > So, it's not always necessary to clear the mask register before vector comparison operation, e.g. when `vm != Assembler::v0_t`. > > Thanks! Looks good, thanks! ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/24968#pullrequestreview-2807480879 From epeter at openjdk.org Wed Apr 30 15:31:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 15:31:07 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 16:11:27 GMT, Quan Anh Mai wrote: > We want to obtain a value that is larger than lo, has the bit at a certain position set and all bits after that unset. So much I knew already, but could still not understand the formula ? > This is the standard operation for alignment when we know that lo is unaligned. To me it is still not "standard". And you use a similar formula elsewhere, so maybe it could be helpful to explain if in more detail somewhere? - if `first_violation == 0`: `alignment = 100..00 = -alignment`. So if `lo >= alignment` -> `lo & -alignment = 100..000`, and `new_lo = 0`, we have an overflow it seems? I guess that would make sense. And if `lo < alignment`, the result is rounded up to `100..000`, also good. - if `first_violation > 0`: `alignment = 0..010..0`. So `-alignment = 1..110..0`. Now you could probably continue with arguing about the bits of `lo`, and continue that way in a case distinction. To me this seems less than immediately clear or trivial. Maybe I'm just missing some "standard" math, that is well possible ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2068924501 From epeter at openjdk.org Wed Apr 30 15:40:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 15:40:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v50] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 17:01:48 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more rigour for ~hi I had a quick look at some of our conversations, and responded in these: https://github.com/openjdk/jdk/pull/17508/files#r2068879473 https://github.com/openjdk/jdk/pull/17508/files#r2068924501 https://github.com/openjdk/jdk/pull/17508/files#r2068935771 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2842405700 From epeter at openjdk.org Wed Apr 30 15:40:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 15:40:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: <3kJkhxljNnUZ_b6c3jGsi4YA-JEDnZG79jblC_G47Jc=.bd8bf380-8150-4c2c-b4c9-a1b11884c01a@github.com> References: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> <3kJkhxljNnUZ_b6c3jGsi4YA-JEDnZG79jblC_G47Jc=.bd8bf380-8150-4c2c-b4c9-a1b11884c01a@github.com> Message-ID: On Mon, 28 Apr 2025 04:58:42 GMT, Quan Anh Mai wrote: >> @merykitty Oh dear, I dropped it again. Thanks for the reminder! I actually just thought about this one over the easter weekend. And it seems to me we have had lots of "bit optimizations" that could be much more powerfully solved with "known bits". So let's continue working on this. > > @eme64 Ping. Please don't be annoyed as I think I will ping you more frequently in case you forget. I have to say I'm really impressed by all the bit tricks you are using here @merykitty . I'm learning a lot, and I'm very thankful for your patience with me here, and constructing the proofs ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2842409428 From epeter at openjdk.org Wed Apr 30 15:40:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Apr 2025 15:40:09 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 16:34:36 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 307: >> >>> 305: // violation, which is the last set bit of tmp >>> 306: // 0 1 1 0 0 0 0 0 >>> 307: U tmp = ~either & find_mask; >> >> Did I understand that right: `tmp` is the bits that we cannot flip? Or is it the ones we can flip? A better name would be appreciated :) >> Same for `either`. worst case you call it `lo_or_zeros`... but that's not great either. >> >> I'm not fully seeing through the logic here yet, so I struggle to make good suggestions. > > We can say that `tmp` is all the bits we can flip, although I think that is too ambiguous, what does "can" mean here. It is better to think of it as all the bits that are not lower than `first_violation` and are 0 in both `lo` and `zeros`. Hmm ok. Well it would still be helpful to at least have some kind of intuition, and "name" for it. Are all these bits the candidates for `alignment`, of we must pick the one most to the right, i.e. so that we get the smallest `alignment` value? And then why is it these bits, and not any others? An argument would be good here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2068935771 From rriggs at openjdk.org Wed Apr 30 15:52:52 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 30 Apr 2025 15:52:52 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v5] In-Reply-To: <4cfCydBqbJcxVuRmEf7W4ehFuz_MMf_zc4k5dxpQoCU=.cc4aaf74-5a6a-4213-857b-6b5f69fb63d1@github.com> References: <4cfCydBqbJcxVuRmEf7W4ehFuz_MMf_zc4k5dxpQoCU=.cc4aaf74-5a6a-4213-857b-6b5f69fb63d1@github.com> Message-ID: On Wed, 23 Apr 2025 14:12:29 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti I would also point out that IntrinsicCandidate has some off-label uses, in which the method is not replaced by the runtime but is just recognized as distinguished methods and identified for some optimization case. The most common cases are related to the string concatenation subsystem, for example, StringBuilder.append(String). This might be covered under jrose00's more general comment a week ago. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24777#issuecomment-2842453540 From qamai at openjdk.org Wed Apr 30 15:53:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 15:53:41 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v51] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 66 commits: - formality for the non-existence case - Merge branch 'master' into unsignedbounds - more rigour for ~hi - More for Emanuel - Explain what alignment means - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - reviews - Merge branch 'master' into unsignedbounds - refine comments - ... and 56 more: https://git.openjdk.org/jdk/compare/4c695fa8...547926e3 ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=50 Stats: 2406 lines in 13 files changed: 1842 ins; 328 del; 236 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Wed Apr 30 15:53:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 15:53:42 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: References: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> Message-ID: <6wRetlYEA7DpdE-aPXlUVbGtnMu2YFFJF01MbqpD198=.3ce0cd6b-5822-4091-95ee-f9f58a622a7b@github.com> On Wed, 30 Apr 2025 15:08:14 GMT, Emanuel Peter wrote: >> What do you mean? It is the doc for `adjust_lo`, of course it needs to be here, I think it is logical to say what the function does before saying how it does it. For that purpose I think it is very clear already and an example is not needed. > > @merykitty > Hmm, I'm a little nervous about the case where `there does not exist one such number`. Because all your proof does is basically assume that there is such a `r`, and then shows that we compute it correctly. > > But such proofs do not give us the guarantee that if there is no such `r`, that the computation indeed overflows, i.e. produces a number smaller than `lo`. That would be required for the correctness, no? > > So I guess the proof / examples with overflow should happen further down, I'm just mentioning it up here because it is here that you say there can be an overflow. Hope that makes sense ? Thanks for the clarification, I have added a section for this case. The fundamental logic here is that if there exists a result `r`, then the algorithm will find it. The opposite is also true, if the algorithm finds a satisfying `r`, then of course there exists one. This makes the negation of those statements also equivalent, the algorithm does not find a satisfying result if and only if there is no such value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2068970743 From duke at openjdk.org Wed Apr 30 15:58:58 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 15:58:58 GMT Subject: Withdrawn: 8349138: Optimize Math.copySign API for Intel e-core targets In-Reply-To: References: Message-ID: <4lXRBgPB36eb9lTEl3wE5nu2BD1mshRb81pXk05g-Iw=.8ebedf5d-130a-4546-bfb5-3f0c903ec77f@github.com> On Fri, 31 Jan 2025 11:22:47 GMT, Jatin Bhateja wrote: > Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature. > Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations. > > Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence. > > Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction. > > Following are the performance numbers of the following existing microbenchmark > https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java > > Patch passes following validation test > [test/jdk/java/lang/Math/IeeeRecommendedTests.java > ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java) > > > Granite Rapids-AP (P-core Xeon) > Baseline AVX512: > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns > Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns > > Withopt : > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns > Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns > > Baseline AVX2: > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns > Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns > > Withopt : > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns > Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns > > Sierra Forest (E-core Xeon) > Baseline: > Benchmark (seed) Mode Cnt Score Error Units > o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns > o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns > > Withopt: > Benchmark (seed) Mode Cnt Score Error Units > o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676.101 ops/ns > o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23386 From qamai at openjdk.org Wed Apr 30 16:00:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 16:00:06 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 15:28:18 GMT, Emanuel Peter wrote: >> We want to obtain a value that is larger than `lo`, has the bit at a certain position set and all bits after that unset. It is aligning `lo` up to `alignment`. This is the standard operation for alignment when we know that `lo` is unaligned. > >> We want to obtain a value that is larger than lo, has the bit at a certain position set and all bits after that unset. > > So much I knew already, but could still not understand the formula ? > >> This is the standard operation for alignment when we know that lo is unaligned. > > To me it is still not "standard". And you use a similar formula elsewhere, so maybe it could be helpful to explain if in more detail somewhere? > > - if `first_violation == 0`: `alignment = 100..00 = -alignment`. So if `lo >= alignment` -> `lo & -alignment = 100..000`, and `new_lo = 0`, we have an overflow it seems? I guess that would make sense. And if `lo < alignment`, the result is rounded up to `100..000`, also good. > - if `first_violation > 0`: `alignment = 0..010..0`. So `-alignment = 1..110..0`. Now you could probably continue with arguing about the bits of `lo`, and continue that way in a case distinction. > > To me this seems less than immediately clear or trivial. Maybe I'm just missing some "standard" math, that is well possible ? Maybe the thing you are missing here is that if a value `v` is a power of 2, then its negation `-v` will have all the bit set upto the location that is set in `v`. E.g. `4 = 0x00...0100 -> -4 = 0b11...1100`. So `lo & -alignment` is the act of unsetting all the bits after the set bit in `alignment`. A.k.a rounding down according to `alignment`. So if you want to round up you just add `alignment` since `lo` is known to not be divisible by `alignment`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2068984301 From qamai at openjdk.org Wed Apr 30 16:05:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 16:05:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: <9soKOmnELjXW5rHyAoX3RJc8-CcqBqXmtzmTSAWhabc=.e421906f-e833-4835-bfb5-79a4f18f76bc@github.com> On Wed, 30 Apr 2025 15:34:07 GMT, Emanuel Peter wrote: >> We can say that `tmp` is all the bits we can flip, although I think that is too ambiguous, what does "can" mean here. It is better to think of it as all the bits that are not lower than `first_violation` and are 0 in both `lo` and `zeros`. > > Hmm ok. Well it would still be helpful to at least have some kind of intuition, and "name" for it. Are all these bits the candidates for `alignment`, of we must pick the one most to the right, i.e. so that we get the smallest `alignment` value? > And then why is it these bits, and not any others? An argument would be good here. Hmmm these bits are the ones that are not lower than `first_violation` and are 0 in both `lo` and `zeros`. This description aligns perfectly with the conclusion in the formality section above. So what we are finding is the last such bit here. I say that in the following computation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2068991249 From qamai at openjdk.org Wed Apr 30 16:11:45 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 16:11:45 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v52] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: alignment note ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/547926e3..0eafb3ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=50-51 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Wed Apr 30 16:11:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 16:11:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> <3kJkhxljNnUZ_b6c3jGsi4YA-JEDnZG79jblC_G47Jc=.bd8bf380-8150-4c2c-b4c9-a1b11884c01a@github.com> Message-ID: On Wed, 30 Apr 2025 15:37:20 GMT, Emanuel Peter wrote: >> @eme64 Ping. Please don't be annoyed as I think I will ping you more frequently in case you forget. > > I have to say I'm really impressed by all the bit tricks you are using here @merykitty . I'm learning a lot, and I'm very thankful for your patience with me here, and constructing the proofs ? @eme64 I have answered all of your questions, I have added a section for the cases there does not exist such a value and a section explaining the details of rounding up `lo` to a multiple of `alignment`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2842505500 From kvn at openjdk.org Wed Apr 30 16:45:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 16:45:49 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> References: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> Message-ID: <8VySQB_bhRDQ7Kx2QfD73s9CSezPAMvHKcyuyiPMEy0=.61ce2b02-2d6c-4e44-90db-85d9bf3bf9b8@github.com> On Tue, 29 Apr 2025 12:47:25 GMT, Aleksey Shipilev wrote: >> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. >> >> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. >> >> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fiddle with locks My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24933#pullrequestreview-2807742467 From kvn at openjdk.org Wed Apr 30 16:59:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 16:59:45 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc > -XX:RepeatCompilation=300 Based on run's output times this add only 2 sec. I agree to reduce number of iterations (may be 100) but not complete remove it. > replacing the loop by a mere run() I would not do that. First compilation of `run()` will be OSR and we will never run it fully compiled. You need several iterations in `main()` to trigger and use normal compilation. But 100 iterations should be fine. This should put execution time under 1 sec. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2842642683 From kxu at openjdk.org Wed Apr 30 18:04:38 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 30 Apr 2025 18:04:38 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v15] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove asserts, add more documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/e9b42dcc..7cce522f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=13-14 Stats: 47 lines in 3 files changed: 20 ins; 12 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From mli at openjdk.org Wed Apr 30 18:54:44 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 30 Apr 2025 18:54:44 GMT Subject: RFR: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 15:23:55 GMT, Dingli Zhang wrote: > Looks good, thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24968#issuecomment-2842979261 From shade at openjdk.org Wed Apr 30 19:08:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Apr 2025 19:08:31 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on 1 CPU machine Message-ID: There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:-TieredCompilation -XX:ActiveProcessorCount=1 -XX:+PrintFlagsFinal 2>&1 | grep CICompilerCount intx CICompilerCount = 2 {product} {ergonomic} bool CICompilerCountPerCPU = true {product} {default} $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:ActiveProcessorCount=1 -XX:+PrintFlagsFinal 2>&1 | grep CICompilerCount intx CICompilerCount = 2 {product} {ergonomic} bool CICompilerCountPerCPU = true {product} {default} It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. Additional testing: - [x] New regression test passes with the fix, fails without it - [ ] GHA ------------- Commit messages: - Unnecessary arch limitation - Simplify test - Adjust test bound - Fix Changes: https://git.openjdk.org/jdk/pull/24972/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356000 Stats: 83 lines in 2 files changed: 78 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24972/head:pull/24972 PR: https://git.openjdk.org/jdk/pull/24972 From bulasevich at openjdk.org Wed Apr 30 19:35:23 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 30 Apr 2025 19:35:23 GMT Subject: RFR: 8355896: Lossy narrowing cast of JVMCINMethodData::size Message-ID: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> In https://github.com/openjdk/jdk/pull/21276 mutable_data, which includes relocations, metadata, and jvmci_data, was moved to a separately malloc'ed blob. The nmethod (a CodeBlob) holds a pointer to the mutable_data blob and stores its internal offsets. As part of that change, I reused the former uint16_t offset field to store jvmci_data_size. This turned out to be incorrect, since jvmci_data can exceed 64 KB (as shown in https://github.com/openjdk/jdk/pull/24753). The most direct fix would be to change jvmci_data_size to uint, placing it alongside other int fields to avoid padding. However, in fact on my build this increases the size of the nmethod structure from 240 to 248 bytes, which I would prefer to avoid. Instead, I propose storing metadata_size in the existing uint16_t field. The average metadata_size is approximately 140 bytes, and the maximum observed in practice is around 4 KB. While, like oops_size, this value is not formally guaranteed to remain below 64 KB, no cases have been observed where this limit is exceeded. A GUARANTEE check is included to immediately catch any overflow if it ever occurs. Testing: in progress. ------------- Commit messages: - 8355896: lossy narrowing cast of JVMCINMethodData::size Changes: https://git.openjdk.org/jdk/pull/24965/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24965&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355896 Stats: 12 lines in 2 files changed: 4 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24965/head:pull/24965 PR: https://git.openjdk.org/jdk/pull/24965 From bulasevich at openjdk.org Wed Apr 30 19:38:58 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 30 Apr 2025 19:38:58 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 29 Apr 2025 21:16:34 GMT, Vladimir Kozlov wrote: >> Yes, my mistake. I was thinking `_jvmci_data_offset` was used to compute `jvmci_data_end()`, not `jvmci_data_begin()`. > > Yes we should use 32 bits. Even if we revert back to using _jvmci_data_offset we can **NOT** use uint16_t because size of relocation (after which JVMCI data is placed) data is bigger. Thanks for the report. Yes, cast to uint16 is wrong. I am going to fix the issue here: https://github.com/openjdk/jdk/pull/24965 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r2069328485 From kvn at openjdk.org Wed Apr 30 19:45:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 19:45:44 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on 1 CPU machine In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 19:00:23 GMT, Aleksey Shipilev wrote: > There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. > > But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. > > > $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:-TieredCompilation -XX:ActiveProcessorCount=1 -XX:+PrintFlagsFinal 2>&1 | grep CICompilerCount > intx CICompilerCount = 2 {product} {ergonomic} > bool CICompilerCountPerCPU = true {product} {default} > > $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:ActiveProcessorCount=1 -XX:+PrintFlagsFinal 2>&1 | grep CICompilerCount > intx CICompilerCount = 2 {product} {ergonomic} > bool CICompilerCountPerCPU = true {product} {default} > > > It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. > > Additional testing: > - [x] New regression test passes with the fix, fails without it > - [ ] GHA src/hotspot/share/compiler/compilationPolicy.cpp line 471: > 469: count = MAX2(max_count, min_count); > 470: } > 471: assert((!c1_only && !c2_only) || count <= active_cpus, "Too many threads: %d", count); Should it be the general rule: don't create more compiler threads than available cpus? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2069334176 From kvn at openjdk.org Wed Apr 30 19:45:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 19:45:45 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on 1 CPU machine In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 19:40:51 GMT, Vladimir Kozlov wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. >> >> >> $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:-TieredCompilation -XX:ActiveProcessorCount=1 -XX:+PrintFlagsFinal 2>&1 | grep CICompilerCount >> intx CICompilerCount = 2 {product} {ergonomic} >> bool CICompilerCountPerCPU = true {product} {default} >> >> $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:ActiveProcessorCount=1 -XX:+PrintFlagsFinal 2>&1 | grep CICompilerCount >> intx CICompilerCount = 2 {product} {ergonomic} >> bool CICompilerCountPerCPU = true {product} {default} >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [ ] GHA > > src/hotspot/share/compiler/compilationPolicy.cpp line 471: > >> 469: count = MAX2(max_count, min_count); >> 470: } >> 471: assert((!c1_only && !c2_only) || count <= active_cpus, "Too many threads: %d", count); > > Should it be the general rule: don't create more compiler threads than available cpus? Except when specified on command line with `-XX:CICompilerCount=n`. Actually your changes does not take this flag into account. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2069336376 From kvn at openjdk.org Wed Apr 30 19:50:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 19:50:44 GMT Subject: RFR: 8355896: Lossy narrowing cast of JVMCINMethodData::size In-Reply-To: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> References: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> Message-ID: On Wed, 30 Apr 2025 13:10:19 GMT, Boris Ulasevich wrote: > In https://github.com/openjdk/jdk/pull/21276 mutable_data, which includes relocations, metadata, and jvmci_data, was moved to a separately malloc'ed blob. The nmethod (a CodeBlob) holds a pointer to the mutable_data blob and stores its internal offsets. > > As part of that change, I reused the former uint16_t offset field to store jvmci_data_size. This turned out to be incorrect, since jvmci_data can exceed 64 KB (as shown in https://github.com/openjdk/jdk/pull/24753). > > The most direct fix would be to change jvmci_data_size to uint, placing it alongside other int fields to avoid padding. However, in fact on my build this increases the size of the nmethod structure from 240 to 248 bytes, which I would prefer to avoid. > > Instead, I propose storing metadata_size in the existing uint16_t field. The average metadata_size is approximately 140 bytes, and the maximum observed in practice is around 4 KB. While, like oops_size, this value is not formally guaranteed to remain below 64 KB, no cases have been observed where this limit is exceeded. A GUARANTEE check is included to immediately catch any overflow if it ever occurs. > > Testing: in progress. Let me test it. ------------- PR Review: https://git.openjdk.org/jdk/pull/24965#pullrequestreview-2808241843 From dlong at openjdk.org Wed Apr 30 19:55:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Apr 2025 19:55:52 GMT Subject: RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int' [v2] In-Reply-To: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> References: <1rMdWncORHmyw5LXnDdyCC4-6-9ejGhNj2uLXmKI3So=.316e8847-6c33-42bc-ad79-7c0113335850@github.com> Message-ID: On Tue, 29 Apr 2025 16:37:02 GMT, Marc Chevalier wrote: >> The double `(double)count * prof_factor * method_life / counter_life + 0.5` >> can overflow a 32-bit int, causing UB on casting, but in practice computing >> a wrong scale, probably. >> >> We just need to compare that the cast is not going to overflow. This is possible >> because `INT_MAX` is exactly representable in a `double`. It is also good to >> notice that the expression `(double)count * prof_factor * method_life / counter_life + 0.5` >> cannot overflow a `double`: >> - `count` is a int, max value = 2^31-1 < 2.2e9 >> - `method_lie` is a int, max value < 2.2e9 >> - `prof_factor` is a float, max value < 3.5e38 >> - `counter_life` is a int, positive at this point, so min value = 1 >> So, the whole expression is bounded by 16.94e56 + 0.5, which is much smaller than the >> max value of a double (about 1.8e308). We probably would have precision issues, but >> it probably doesn't matter a lot. >> >> The semantic I picked here is basically `min(INT_MAX, count_d)`, so it'd always fit. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > +comment Oops, I saw the added comment but didn't realize the integration was blocked needing a re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24824#issuecomment-2843118446 From duke at openjdk.org Wed Apr 30 20:44:25 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 30 Apr 2025 20:44:25 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v4] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8322174 - Merge master - num_8b_elems_in_vec --> nof_vec_elems - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. - 8322174: RISC-V: C2 VectorizedHashCode RVV Version ------------- Changes: https://git.openjdk.org/jdk/pull/17413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=03 Stats: 531 lines in 6 files changed: 529 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From iveresov at openjdk.org Wed Apr 30 20:56:32 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 20:56:32 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v6] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Remove the proxy class counter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/4514d032..38a156f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=04-05 Stats: 9 lines in 2 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iveresov at openjdk.org Wed Apr 30 21:00:58 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 21:00:58 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v6] In-Reply-To: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> References: <3JsetDO0CDSY3TVxn9pC7YfbNq8-BDZ2UwNo38qJuOc=.e5b30111-294e-45ba-a9aa-cf8d09e26d45@github.com> Message-ID: On Sat, 26 Apr 2025 22:36:11 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove the proxy class counter > > src/hotspot/share/cds/archiveBuilder.cpp line 770: > >> 768: relocate_embedded_pointers(&_rw_src_objs); >> 769: relocate_embedded_pointers(&_ro_src_objs); >> 770: log_info(cds)("Relocating %zu pointers, %zu tagged, %zu nulled", > > `log_info(aot)` if it is Leyden related. This more like a generic cds message. I'll leave this one as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2069431184 From iveresov at openjdk.org Wed Apr 30 21:14:16 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 21:14:16 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v7] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Fix log tags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/38a156f3..ef5dfcca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=05-06 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From liach at openjdk.org Wed Apr 30 22:26:31 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 30 Apr 2025 22:26:31 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v5] In-Reply-To: <4cfCydBqbJcxVuRmEf7W4ehFuz_MMf_zc4k5dxpQoCU=.cc4aaf74-5a6a-4213-857b-6b5f69fb63d1@github.com> References: <4cfCydBqbJcxVuRmEf7W4ehFuz_MMf_zc4k5dxpQoCU=.cc4aaf74-5a6a-4213-857b-6b5f69fb63d1@github.com> Message-ID: On Wed, 23 Apr 2025 14:12:29 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti Thanks for the suggestion. In this aspect, intrinsification is just a topic under this annotation - "intrinsic" is mainly a special identifier for hardcoded logic, but the exact purpose is unspecified, with intrinsification being the most common one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24777#issuecomment-2843500730 From liach at openjdk.org Wed Apr 30 22:26:30 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 30 Apr 2025 22:26:30 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: > In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Move intrinsic to be a subsection; just one most common function of the annotation - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java Co-authored-by: Raffaello Giulietti - Shorter first sentence - Updates, thanks to John - Refine validation and defensive copying - 8355223: Improve documentation on @IntrinsicCandidate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24777/files - new: https://git.openjdk.org/jdk/pull/24777/files/24ed1cc1..317dd27a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=04-05 Stats: 51812 lines in 1375 files changed: 39564 ins; 7638 del; 4610 mod Patch: https://git.openjdk.org/jdk/pull/24777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777 PR: https://git.openjdk.org/jdk/pull/24777 From dhanalla at openjdk.org Wed Apr 30 22:40:51 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 30 Apr 2025 22:40:51 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: <5JpWWMlRP-o60KZI9bU5bMq-dJePHvnKdUgigCfwbfo=.c5545951-3f36-43da-b082-79a3a00ac6c0@github.com> Message-ID: On Wed, 23 Apr 2025 17:01:33 GMT, Dhamoder Nalla wrote: >>> @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. >> >> You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. >> Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved ? each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. >> However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA. > >> > @dhanalla I see that you have had a conversation with @chhagedorn here, where you explained more details about what exactly goes wrong. Can you please update the PR description with these details? Generally, that makes it much easier to review, then the reviewers don't need to read through the whole conversation and figure out what is now stale (things you already applied) and what is still an active conversation. While you are at it, you can also update the description on JIRA. >> >> You're right that in the current implementation, we begin the scalarization process and only bail out once the live node count has already exceeded the limit. At that point, the graph is indeed partially transformed, which is why we fall back to recompilation without EA to ensure a safe and consistent compilation state. Accurately predicting the number of nodes before transformation is difficult due to the variety of types and structures involved ? each element can lead to multiple nodes (e.g., phi nodes, loads/stores, etc.), and the graph can grow non-linearly depending on how the array is used. However, I agree that giving up entirely on EA just because of one large array seems like an overly conservative fallback, especially if the rest of the method would still benefit from EA. > > @eme64 If this answers your question, this PR is ready for review > @dhanalla I see. @chhagedorn and I quickly looked through the code, and it seems there are other bailouts that use the FudgeFactor. > > It also seems that you need an unreasonably high `EliminateAllocationArraySizeLimit`, and so this failure should never actually happen normally, right? Or is it possible to reproduce the same bug with a lower `EliminateAllocationArraySizeLimit` but just more allocations? If so, it would be good if you added such test cases. > > Is it possible to exceed the node limit with the default `EliminateAllocationArraySizeLimit`, i.e. so that we would hit the assert before your changes, and bailout after your changes? > > I have two worries, and maybe @vnkozlov can say something here: > > * By the time we check the condition and bail out, we may have allocated a lot of nodes, and possibly be far over the node limit. That means we already used a lot of memory and time. How bad can this get? > * And as discussed above: we could have done EA partially, until getting close to the node limit, and then not do allocation elimination on the remaining allocations. That would be a partial benefit, which we do not have if we recompile without EA. @eme64 Yes, that?s generally accurate. Under typical usage conditions and with the default EliminateAllocationArraySizeLimit value of 64, this assertion failure is not expected to occur. It appears that the bug is challenging to reproduce with the default values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2843562602 From vlivanov at openjdk.org Wed Apr 30 23:04:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 30 Apr 2025 23:04:52 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: <2q_jgLrJim5ntOfr3awHdl1HTgrWtcTdcHWsO_CfnHU=.7958f1ef-9792-416a-9474-33f776e01fb5@github.com> Message-ID: On Wed, 30 Apr 2025 22:55:38 GMT, Vladimir Ivanov wrote: >> I added `log_info()` to `exit_vm_on_*_failure()` methods to produce notification when AbortVMOnAOTCodeFailure flag is off (default value). > > The naming (`exit_vm_on_load_failure` and `exit_vm_on_store_failure`) still look confusing to me. By default, they disable `AOTAdapterCaching` and issue a message, but the name strongly suggests that execution halts there: > > if (!open_cache(is_dumping, is_using)) { > if (is_using) { > exit_vm_on_load_failure(); > } else { > exit_vm_on_store_failure(); > } > return; > } > Setting AOTAdapterCaching to false on failure is simple indication that adapter caching is switched off for someone who will look on final state of flag. But how does it affect execution? What's the intended behavior when a failure happens during store attempt? What are the consistency guarantees for AOT code cache during dumping in presense of store failures? What I see right now is that errors reported by `AOTCodeCache::store_code_blob()` are silently ignored. How does `_cache` notice a store failure during dumping phase? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069625888 From iveresov at openjdk.org Wed Apr 30 22:58:09 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 22:58:09 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v8] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Fix flag behavior ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/ef5dfcca..b937681e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=06-07 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iveresov at openjdk.org Wed Apr 30 23:00:50 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 30 Apr 2025 23:00:50 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v4] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 17:35:13 GMT, Vladimir Kozlov wrote: >> Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge branch 'master' into pp2 >> - Fix class filtering >> - Remove the workaround of setting AOTRecordTraining during assembly >> - Address some of the review comments >> - Merge branch 'master' into pp >> - Add AOTCompileEagerly flag to control compilation after clinit >> - Port 8355334: [leyden] Missing type profile info in archived training data >> - Port 8355296: [leyden] Some methods are stuck at level=0 with -XX:-TieredCompilation >> - Use ENABLE_IF macro >> - Missing part of the last commit >> - ... and 22 more: https://git.openjdk.org/jdk/compare/2447b981...7fb7ae62 > > Looks better. > There are still places where UL is used specifically for TD processing. Consider using `(aot, training)` there instead of `(cds)`. Vladimir (@vnkozlov), I did the changes that you requested. You please do another pass? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24886#issuecomment-2843621301 From vlivanov at openjdk.org Wed Apr 30 23:04:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 30 Apr 2025 23:04:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 02:05:41 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address Ioi's comments src/hotspot/share/code/aotCodeCache.cpp line 60: > 58: vm_exit_during_initialization("Unable to use AOT Code Cache.", nullptr); > 59: } > 60: log_info(aot, codecache, init)("Unable to use AOT Code Cache."); Should it be a warning instead? src/hotspot/share/code/aotCodeCache.cpp line 69: > 67: vm_abort(false); > 68: } > 69: log_info(aot, codecache, exit)("Unable to create AOT Code Cache."); Same here (`log_warning`?). src/hotspot/share/code/aotCodeCache.hpp line 31: > 29: * AOT Code Cache collects code from Code Cache and corresponding metadata > 30: * during application training run. > 31: * In following "production" runs this code and data can me loaded into s/me/be/ src/hotspot/share/code/codeBlob.hpp line 208: > 206: CodeBlob* as_codeblob() const { return (CodeBlob*) this; } > 207: AdapterBlob* as_adapter_blob() const { assert(is_adapter_blob(), "must be adapter blob"); return (AdapterBlob*) this; } > 208: ExceptionBlob* as_exception_blob() const { assert(is_exception_stub(), "must be exception stub"); return (ExceptionBlob*) this; } `ExceptionBlob` is C2-specific, but `as_exception_blob()` is unused. src/hotspot/share/code/relocInfo.hpp line 1292: > 1290: void pack_data_to(CodeSection * dest) override; > 1291: void unpack_data() override; > 1292: #if defined(AARCH64) It's unfortunate to see AArch64-specific code in shared code. But I don't see anything besides`pd_destination()` and `pd_set_destination()` declarations. Where're their bodies? src/hotspot/share/runtime/sharedRuntime.cpp line 2780: > 2778: > 2779: #ifndef PRODUCT > 2780: void AdapterHandlerLibrary::print_adapter_handler_info(AdapterHandlerEntry* handler, AdapterBlob* adapter_blob) { Suggestion: pass `tty` explicitly as `outputStream*` void AdapterHandlerLibrary::print_adapter_handler_info_on(outptutStream* st, AdapterHandlerEntry* handler, AdapterBlob* adapter_blob) { src/hotspot/share/runtime/sharedRuntime.cpp line 2852: > 2850: entry_offset[2] = handler->get_c2i_unverified_entry() - i2c_entry; > 2851: entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; > 2852: AOTCodeCache::store_code_blob(*adapter_blob, AOTCodeEntry::Adapter, id, name, AdapterHandlerEntry::ENTRIES_COUNT, entry_offset); What the intended behavior here when `AOTCodeCache::store_code_blob` fails? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069610692 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069611080 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069563790 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069585108 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069558227 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069547808 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069619002 From vlivanov at openjdk.org Wed Apr 30 23:04:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 30 Apr 2025 23:04:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: <2q_jgLrJim5ntOfr3awHdl1HTgrWtcTdcHWsO_CfnHU=.7958f1ef-9792-416a-9474-33f776e01fb5@github.com> References: <2q_jgLrJim5ntOfr3awHdl1HTgrWtcTdcHWsO_CfnHU=.7958f1ef-9792-416a-9474-33f776e01fb5@github.com> Message-ID: On Fri, 25 Apr 2025 00:52:04 GMT, Vladimir Kozlov wrote: >> AOT adapters code caching and loading is guarded by these methods not by flag. >> >> Setting AOTAdapterCaching to false on failure is simple indication that adapter caching is switched off for someone who will look on final state of flag. > > I added `log_info()` to `exit_vm_on_*_failure()` methods to produce notification when AbortVMOnAOTCodeFailure flag is off (default value). The naming (`exit_vm_on_load_failure` and `exit_vm_on_store_failure`) still look confusing to me. By default, they disable `AOTAdapterCaching` and issue a message, but the name strongly suggests that execution halts there: if (!open_cache(is_dumping, is_using)) { if (is_using) { exit_vm_on_load_failure(); } else { exit_vm_on_store_failure(); } return; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069613390 From kvn at openjdk.org Wed Apr 30 23:40:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 23:40:47 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v8] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:58:09 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Fix flag behavior Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2808829130 From kvn at openjdk.org Wed Apr 30 23:58:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 23:58:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: <4sPSeelY_aPOk87j8XiobBz2MjWX1nzBeHWlhgFHnDs=.c498d25f-acbf-493f-9eaa-0fa2ea43433f@github.com> On Wed, 30 Apr 2025 22:22:18 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address Ioi's comments > > src/hotspot/share/code/relocInfo.hpp line 1292: > >> 1290: void pack_data_to(CodeSection * dest) override; >> 1291: void unpack_data() override; >> 1292: #if defined(AARCH64) > > It's unfortunate to see AArch64-specific code in shared code. > > But I don't see anything besides`pd_destination()` and `pd_set_destination()` declarations. Where're their bodies? Good catch. I copied it from leyden/premain but it is used only for trampoline jump relocation for calls in nmethods on Aarch64. They are not present in adapters. I remove this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069677599 From kvn at openjdk.org Wed Apr 30 23:55:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Apr 2025 23:55:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:09:24 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address Ioi's comments > > src/hotspot/share/runtime/sharedRuntime.cpp line 2780: > >> 2778: >> 2779: #ifndef PRODUCT >> 2780: void AdapterHandlerLibrary::print_adapter_handler_info(AdapterHandlerEntry* handler, AdapterBlob* adapter_blob) { > > Suggestion: pass `tty` explicitly as `outputStream*` > > > void AdapterHandlerLibrary::print_adapter_handler_info_on(outptutStream* st, AdapterHandlerEntry* handler, AdapterBlob* adapter_blob) { Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069675872