From kvn at openjdk.org Sun Jun 1 00:11:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 00:11:55 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: <_dcowgIjM5R9m4Ye0BNclWFRkNW_GisoCsrSAW4b0rI=.fced02a6-16ac-4a56-bd45-3e3b6e764bb5@github.com> On Sat, 31 May 2025 19:16:17 GMT, Ashutosh Mehra wrote: >> By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. >> >> Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. >> >> I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. >> >> The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` >> >> I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. >> >> I did small code cleanup/renaming. >> >> Tested: tier1-10 > > src/hotspot/share/code/aotCodeCache.cpp line 434: > >> 432: >> 433: if (((_flags & enableContendedPadding) != 0) != EnableContended) { >> 434: log_debug(aot, codecache, init)("AOT Code Cache disabled: it was created with EnableContended = %s", EnableContended ? "false" : "true"); > > This check says code cache is disabled, but we still return true. Same with other checks following this. Is that intentional? The rest of checks are for nmethods, UseCodeCaching. May be I should remove it to avoid confusion. > src/hotspot/share/code/aotCodeCache.cpp line 1011: > >> 1009: } >> 1010: case relocInfo::runtime_call_w_cp_type: >> 1011: log_debug(aot, codecache, reloc)("runtime_call_w_cp_type relocation is not unimplemented"); > > typo: "relocation is not unimplemented" -> "relocation is unimplemented" fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25525#discussion_r2118436203 PR Review Comment: https://git.openjdk.org/jdk/pull/25525#discussion_r2118436486 From kvn at openjdk.org Sun Jun 1 00:22:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 00:22:58 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Sat, 31 May 2025 19:50:04 GMT, Ashutosh Mehra wrote: >> By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. >> >> Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. >> >> I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. >> >> The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` >> >> I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. >> >> I did small code cleanup/renaming. >> >> Tested: tier1-10 > > src/hotspot/share/code/aotCodeCache.cpp line 985: > >> 983: // ------------ process code and data -------------- >> 984: >> 985: #define BAD_ADDRESS_ID -2 > > Can you please add a comment to indicate why -1 is not used. > From the comment in `id_for_address`, I guess it is because -1 is a valid id for representing jump to itself in static call stub. Is that correct? > > int id = -1; > if (addr == (address)-1) { // Static call stub has jump to itself > return id; > } Yes, it is correct. I will add the comment: // Can't use -1. It is valid value for jump to iteself destination // used by static call stub: see NativeJump::jump_destination(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25525#discussion_r2118454632 From kvn at openjdk.org Sun Jun 1 00:28:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 00:28:15 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Thu, 29 May 2025 18:45:11 GMT, Vladimir Kozlov wrote: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 Thank you, @ashu-mehra. I addressed your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25525#issuecomment-2926099925 From kvn at openjdk.org Sun Jun 1 00:28:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 00:28:15 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully [v2] In-Reply-To: References: Message-ID: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25525/files - new: https://git.openjdk.org/jdk/pull/25525/files/3399e5f9..497c141d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25525&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25525&range=00-01 Stats: 22 lines in 1 file changed: 2 ins; 18 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25525.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25525/head:pull/25525 PR: https://git.openjdk.org/jdk/pull/25525 From kvn at openjdk.org Sun Jun 1 00:29:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 00:29:50 GMT Subject: RFR: 8358230: Incorrect location for the assert for blob != nullptr in CodeBlob::create In-Reply-To: References: Message-ID: On Sat, 31 May 2025 20:45:56 GMT, Ashutosh Mehra wrote: > A trivial fix to moves the assert for `blob != nullptr` before any usage of the the `blob` Yes, it is trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25566#pullrequestreview-2884824886 From asmehra at openjdk.org Sun Jun 1 01:05:09 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Sun, 1 Jun 2025 01:05:09 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully [v2] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 00:28:15 GMT, Vladimir Kozlov wrote: >> By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. >> >> Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. >> >> I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. >> >> The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` >> >> I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. >> >> I did small code cleanup/renaming. >> >> Tested: tier1-10 > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Marked as reviewed by asmehra (Committer). Thanks for addressing the comments. Looks good. ------------- PR Review: https://git.openjdk.org/jdk/pull/25525#pullrequestreview-2884902972 PR Comment: https://git.openjdk.org/jdk/pull/25525#issuecomment-2926206541 From asmehra at openjdk.org Sun Jun 1 01:08:01 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Sun, 1 Jun 2025 01:08:01 GMT Subject: Integrated: 8358230: Incorrect location for the assert for blob != nullptr in CodeBlob::create In-Reply-To: References: Message-ID: On Sat, 31 May 2025 20:45:56 GMT, Ashutosh Mehra wrote: > A trivial fix to moves the assert for `blob != nullptr` before any usage of the the `blob` This pull request has now been integrated. Changeset: 59dc8499 Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/59dc849909c1edc892c94a27b0340fcf53db3a98 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod 8358230: Incorrect location for the assert for blob != nullptr in CodeBlob::create Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25566 From iveresov at openjdk.org Sun Jun 1 03:03:51 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 1 Jun 2025 03:03:51 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully [v2] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 00:28:15 GMT, Vladimir Kozlov wrote: >> By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. >> >> Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. >> >> I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. >> >> The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` >> >> I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. >> >> I did small code cleanup/renaming. >> >> Tested: tier1-10 > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address comments Marked as reviewed by iveresov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25525#pullrequestreview-2884989001 From kvn at openjdk.org Sun Jun 1 03:59:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 03:59:55 GMT Subject: Integrated: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Thu, 29 May 2025 18:45:11 GMT, Vladimir Kozlov wrote: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 This pull request has now been integrated. Changeset: e3eb089d Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/e3eb089d47d62ae6feeba3dc6b3752a025e27bed Stats: 130 lines in 2 files changed: 41 ins; 46 del; 43 mod 8357175: Failure to generate or load AOT code should be handled gracefully Reviewed-by: iveresov, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/25525 From epeter at openjdk.org Sun Jun 1 05:37:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 05:37:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <2nsC6sfjkW6j7aMI9TwUgOM4qcyqQj03xGQ8WKfd2VU=.46a960b2-941e-40dc-917f-331aea0e6a70@github.com> On Fri, 30 May 2025 07:54:53 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 151: > >> 149: "System.out.println(", arg, ");\n", // capture arg via lambda argument >> 150: "System.out.println(#arg);\n", // capture arg via hashtag replacement >> 151: "System.out.println(#{arg});\n", // capture arg via hashtag replacement with brackets > > It's not clear here why one should use brackets. If there is an argument for those further down, then you can cross reference. Otherwise, it might need some explanation here. I rewrote the whole section a little: 155 // It would have been optimal to use Java String Templates to format 156 // argument values into Strings. However, since these are not (yet) 157 // available, the Template Framework provides two alternative ways of 158 // formatting Strings: 159 // 1) By appending to the comma-separated list of Tokens passed to body(). 160 // Appending as a Token works whenever one has a reference to the Object 161 // in Java code. But often, this is rather cumbersome and looks awkward, 162 // given all the additional quotes and commands required. Hence, it 163 // is encouraged to only use this method when necessary. 164 // 2) By hashtag replacements inside a single string. One can either 165 // use "#arg" directly, or use brackets "#{arg}". When possible, one 166 // should prefer avoiding the brackets, as they create additional 167 // noise. However, there are cases where they are useful, for 168 // example "#TYPE_CON" would be parsed as a hashtag replacement 169 // for the hashtag name "TYPE_CON", whereas "#{TYPE}_CON" is 170 // parsed as hashtag name "TYPE", followed by literal string "_CON". 171 // See also: generateWithHashtagAndDollarReplacements2 172 // There are two ways to define the value of a hashtag replacement: 173 // a) Capturing Template arguments as Strings. 174 // b) Using a "let" definition (see examples further down). 175 // Which one should be preferred is a code style question. Generally, we 176 // prefer the use of hashtag replacements because that allows easy use of _ 177 // multiline strings (i.e. text blocks). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118746980 From epeter at openjdk.org Sun Jun 1 05:45:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 05:45:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:08:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 258: > >> 256: >> 257: // Render templateClass to String. >> 258: return templateClass.render(); > > When printing this, it starts at `var_2` and not `var_1`. Why is that? The `nextTemplateFrameId` starts at zero, and is incremented for every Template instantiation. The `templateClass` has `nextTemplateFrameId=1`. If there was any use of `$`, it would append `_1`. For `template1.asToken(1)` we have `nextTemplateFrameId=2` -> produces the `var_2`. Generally, the API does not make any guarantees about what id we give, it is just unique. Is that ok for you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118752174 From epeter at openjdk.org Sun Jun 1 05:45:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 05:45:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 05:41:29 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 258: >> >>> 256: >>> 257: // Render templateClass to String. >>> 258: return templateClass.render(); >> >> When printing this, it starts at `var_2` and not `var_1`. Why is that? > > The `nextTemplateFrameId` starts at zero, and is incremented for every Template instantiation. > The `templateClass` has `nextTemplateFrameId=1`. If there was any use of `$`, it would append `_1`. > For `template1.asToken(1)` we have `nextTemplateFrameId=2` -> produces the `var_2`. > Generally, the API does not make any guarantees about what id we give, it is just unique. > > Is that ok for you? Ah, I guess the comment above talks about `var_1, var_2 ...` hmm. I suppose I can add another comment for that in the test code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118752394 From epeter at openjdk.org Sun Jun 1 06:02:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 06:02:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v67] In-Reply-To: References: Message-ID: <3o8bVN9T_7p1h4miFfyUXDnyESEh4YAMzJhPcmE6XmI=.be1003c1-6867-4971-be12-1aa9389cf25e@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ea2bb65d..68b45b1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=66 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=65-66 Stats: 25 lines in 1 file changed: 21 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Sun Jun 1 06:02:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 06:02:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 05:42:37 GMT, Emanuel Peter wrote: >> The `nextTemplateFrameId` starts at zero, and is incremented for every Template instantiation. >> The `templateClass` has `nextTemplateFrameId=1`. If there was any use of `$`, it would append `_1`. >> For `template1.asToken(1)` we have `nextTemplateFrameId=2` -> produces the `var_2`. >> Generally, the API does not make any guarantees about what id we give, it is just unique. >> >> Is that ok for you? > > Ah, I guess the comment above talks about `var_1, var_2 ...` hmm. I suppose I can add another comment for that in the test code. Wrote this: 255 var templateClass = Template.make(() -> body( + 256 // The Template Framework API only guarantees that every Template use + 257 // has a unique ID. When using the Templates, all we need is that + 258 // variables from different Template uses do not conflict. But it can + 259 // be helpful to understand how the IDs are produced. The implementation + 260 // simply gives the first Template use the ID=1, and increments from there. + 261 // + 262 // In this example, the templateClass is the first Template use, and + 263 // has ID=1. We never use a dollar replacement here, so the code will + 264 // not show any "_1". 265 """ 266 package p.xyz; 267 268 public class InnerTest3 { 269 public static void main() { 270 """, + 271 // Second Template use: ID=2 -> var_2 272 template1.asToken(1), + 273 // Third Template use: ID=3 -> var_3 274 template1.asToken(7), + 275 // Fourth Template use with template2, no use of dollar, so + 276 // no "_4" shows up in the generated code. Internally, it + 277 // calls template1, shich is the fifth Template use, with + 278 // ID = 5 -> var_5 279 template2.asToken(2), + 280 // Sixth and Seventh Template use -> var_7 281 template2.asToken(5), + 282 // Eighth Template use with template4 -> var_8. + 283 // Ninth Template use with internal call to template3, + 284 // The local "$var" turns to "var_9", but the Template + 285 // argument captured value = "var_8" from the outer + 286 // template use of $("var"). 287 template4.asToken(), 288 """ 289 } 290 } 291 """ 292 )); 293 294 // Render templateClass to String. 295 return templateClass.render(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118771773 From epeter at openjdk.org Sun Jun 1 06:02:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 06:02:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:22:00 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 306: > >> 304: var myHook = new Hook("MyHook"); >> 305: >> 306: var template1 = Template.make("name", "value", (String name, Integer value) -> body( > > One could generally think about using `_` for unused lambda parameters which I think is the common convention. But then I guess we would need to update the documentation about saying "name" and "String name" should be the same and make an exception for unused ones. I don't know. I think it is better to keep the names duplicated. This gives the reader an easier visual aid to check which name has which type. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118774938 From epeter at openjdk.org Sun Jun 1 16:03:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 16:03:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:39:14 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 358: >> >>> 356: >>> 357: // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK >>> 358: // from the Template Library. >> >> Can you expand here on why it's better to use them instead of creating your own? Is it just readability/convenience? > > Another question which is not evidently clear by following the examples: Can and should (not) you use the same hook inside the hook itself, i.e.: > > Hooks.CLASS_HOOK.anchor( > Hooks.CLASS_HOOK.anchor( > // ... > > This is probably not done on purpose but such a situation could arise when nesting more templates and suddenly one anchors the same hook again? I extended the explanations: ~ 397 // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK. ~ 398 // By convention, we use the CLASS_HOOK for class scopes, and METHOD_HOOK for method scopes. + 399 // Whenever we open a class scope, we should anchor a CLASS_HOOK for that scope, and whenever we + 400 // open a method, we should anchor a METHOD_HOOK. Conversely, this allows us to check if we are + 401 // inside a class or method scope by querying "isAnchored". This convention helps us when building + 402 // a large library of Templates. But if you are writing your own self-contained set of Templates, + 403 // you do not have to follow this convention. + 404 // + 405 // Hooks are "re-entrant", that is we can anchor the same hook inside a scope that we already + 406 // anchored it previously. The "Hook.insert" always goes to the innermost anchoring of that + 407 // hook. There are cases where "re-entrant" Hooks are helpful such as nested classes, where + 408 // there is a class scope inside another class scope. Similarly, we can nest lambda bodies + 409 // inside method bodies, so also METHOD_HOOK can be used in such a "re-entrant" way. We could consider having both "re-entrant" and "non-re-entrant" Hooks. But I'm not yet convinced it is a very useful feature. Sure, there could be some confusion with nested hooks. But I think that confusion to code generation, because we can also nest class and method/lambda scopes. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2119274873 From epeter at openjdk.org Sun Jun 1 16:03:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 16:03:10 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:57:44 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 454: > >> 452: // For every recursion depth, some fuel is automatically subtracted >> 453: // so that the fuel slowly depletes with the depth. >> 454: // We keep the recursion going until the fuel is depleted. > > You can also note here that if we forget to check the `fuel()`, the renderer causes a stack overflow because the recursion never ends. Good idea! Added. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 487: > >> 485: // in this scope, and in any nested scope, including nested Templates. This allows us to >> 486: // add some fields and registers in one Template, and later on, in another Template, we >> 487: // can access these fields and registers again with "dataNames()". > > What do you mean by "registers"? Hmm good question. I think I meant "variables". Changed it! > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 596: > >> 594: @Override >> 595: public boolean isSubtypeOf(DataName.Type other) { >> 596: return other instanceof MyPrimitive(String n) && n == name(); > > Is `==` intended? Should it be `equals()`? Nice catch, fixed. Well it did not matter here, but it is good practice I guess. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2119278069 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2119276977 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2119278275 From epeter at openjdk.org Sun Jun 1 16:06:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 16:06:53 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v68] In-Reply-To: References: Message-ID: <2xkmqbUmlAvSV6SUym7pUeA_gwTDErFOMPuzTZ86TAI=.4a2da8f6-db82-46a0-b5b7-3f8fa4b30385@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more fixes from Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/68b45b1c..ab20c217 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=67 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=66-67 Stats: 19 lines in 1 file changed: 14 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Sun Jun 1 16:10:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 16:10:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 10:39:57 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! > > I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) @chhagedorn Thanks a lot for all the great suggestions! I now addressed everything except for: Issue with `$$var` and `$1var`. Similarly, we would have issues with `##name` and `#1name`. https://github.com/openjdk/jdk/pull/24217#discussion_r2115232385 (I'll have to do some more experiments with parsing.) These are issues we could continue the conversation, unless you are satisfied with my answers: https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737 https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2927467228 From epeter at openjdk.org Sun Jun 1 16:14:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 16:14:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: <4MgAjHfzurYkWqrZ6ah81SwKah7IHR7okOxnq5gapb8=.b7b7bfc8-6dd7-4186-9839-b446c86f21a3@github.com> Message-ID: On Sat, 31 May 2025 11:48:39 GMT, Emanuel Peter wrote: >> @chhagedorn >> The current parsing/regex-ing is relatively simple. We only parse the "valid" cases, so the description above is still relevant. >> Your example `$1var` is not a valid pattern, so the regex does not match, and there is no replacement. Sadly, in Java `$1var` is a valid variable name, so there is some chance that the user makes a mistake and gets tripped up by this. >> >> If the user does a call to `let` or `$` with such a bad string `1var`, then they get a `RendererException`. >> >> The question is this: >> Should I really try to parse these "bad" patterns, just to validate them as well? All solutions I can think of are really complicated. Is it worth it? Or is it just a mistake by the user, and so the matching does not happen, and that is the users problem? > > FYI: `$$var` the first `$` is not a valid pattern, so it is not replaced. But `$var` is, and so that part gets replaced. The result is `$var_1`, which sadly happens to also be valid Java code. I think I just need to rewrite the way I parse and replace the strings. Doing a simple regex with `replaceAll` does not work if we also want to allow "bad" patterns such as `$$var` to be parsed, because of ambiguity. My new idea: split the string by `#` and `$`. The first string is just a regular string, because it has no `#` or `$` before it. But all others should start with either a `name` or `{name}` pattern. I should also do the `#` and `$` replacement in a single pass, so that we cannot have one replacement influence the other, i.e. that we have no "replacement injection" issues that may be confusing if anybody ever trips over it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2119291366 From epeter at openjdk.org Sun Jun 1 16:57:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 1 Jun 2025 16:57:49 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v69] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: wip refactor parsing dollar and hashtag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ab20c217..ccc132b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=68 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=67-68 Stats: 52 lines in 1 file changed: 20 ins; 13 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From jbhateja at openjdk.org Sun Jun 1 17:26:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 1 Jun 2025 17:26:07 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Extending tests and review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/b44d62dc..4a491bef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=03-04 Stats: 181 lines in 4 files changed: 112 ins; 5 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Sun Jun 1 17:26:10 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 1 Jun 2025 17:26:10 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v4] In-Reply-To: <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> Message-ID: <4kFfYPljgrRZSDgDmn4XbCB9iwnrETd0eFOxBSV-sVg=.422f1e5d-6182-4ad5-a509-3b1451a71dfc@github.com> On Wed, 28 May 2025 08:56:51 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 >> - Enabling some test points >> - Adding test points and some re-factoring >> - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8352635 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 >> - 8352635: Improve inferencing of Float16 operations with constant inputs > > src/hotspot/share/opto/convertnode.cpp line 290: > >> 288: // If constant lie within Float16 value range, convert it to >> 289: // a half-float constant. >> 290: if (StubRoutines::hf2f(StubRoutines::f2hf(conF)) == conF) { > > How does this behave with `NaN` values? Do you have a test for that below? Extended coveage for NaNs, yes we have new test points for them. > src/hotspot/share/opto/convertnode.cpp line 298: > >> 296: } else { >> 297: f16bOp = phase->transform(Float16NodeFactory::make(f32bOp->Opcode(), f32bOp->in(0), new_var_inp, new_con_inp)); >> 298: } > > Why is the order important here? A comment could help :) Addressed. > src/hotspot/share/opto/subnode.cpp line 566: > >> 564: // applicable to other floating point types. >> 565: // There are no known undefined, unspecified or implimentation specific >> 566: // behaviors w.r.t to floating point non-pointer subtraction. > > That sounds like we are not quite sure "no known" ... problems. Could there be any, or are we sure there are none? C++ follows IEEE 754 semantics for floating-point subtraction and there is no specified undefined behavior related to it in C++ standard. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2119357586 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2119357694 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2119358354 From jbhateja at openjdk.org Sun Jun 1 17:26:10 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 1 Jun 2025 17:26:10 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v4] In-Reply-To: <-d846uXzYApO-CUq6peUgguY2YLpvG6ioAdVkN1wHG0=.94a09310-9d87-481c-b374-05ae99db0133@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> <-d846uXzYApO-CUq6peUgguY2YLpvG6ioAdVkN1wHG0=.94a09310-9d87-481c-b374-05ae99db0133@github.com> Message-ID: On Wed, 28 May 2025 09:09:46 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 320: >> >>> 318: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / INEXACT_FP16); >>> 319: assertResult(Float.float16ToFloat(res), 32.125f, "testInexactFP16ConstantPatterns"); >>> 320: } >> >> Alignment is messed up by one space indentation. >> >> Can you add a comment why we are expecting none of the `HF` ops here? >> Are we expecting any other ops, maybe `F` ops? >> It could be good to check for that, so that we are sure that we get anything even close to our expectation. > > Same for the tests below :) Fixed, IR checks and indentaitons. >> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 363: >> >>> 361: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / EXACT_FP16); >>> 362: assertResult(Float.float16ToFloat(res), 32.125f, "testExactFP16ConstantPatterns"); >>> 363: } >> >> Can we have a test that picks a random `FP16` value, and does result verification on it? Because currently, you are testing the new pattern only with a few example values. > > And: your pattern matching allows the constant to be lhs or rhs, so you should add corresponding tests. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2119358477 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2119358543 From kvn at openjdk.org Sun Jun 1 21:23:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 21:23:53 GMT Subject: RFR: 8358236: [AOT] Graal crashes when trying to use persisted MDOs In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 19:01:27 GMT, Igor Veresov wrote: > Forgot to null out MethodData::_failed_speculations before snapshotting. As a result it gets restored with a dangling pointer. > Testing looks clean. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25570#pullrequestreview-2886119546 From iveresov at openjdk.org Sun Jun 1 21:23:54 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 1 Jun 2025 21:23:54 GMT Subject: Integrated: 8358236: [AOT] Graal crashes when trying to use persisted MDOs In-Reply-To: References: Message-ID: <2VQGaTWxeSr29uU3Ih3S5kF9l70w3xwlkHNG_pVFr7U=.3279eb7c-5bf8-4df1-8405-61b1678552d5@github.com> On Sun, 1 Jun 2025 19:01:27 GMT, Igor Veresov wrote: > Forgot to null out MethodData::_failed_speculations before snapshotting. As a result it gets restored with a dangling pointer. > Testing looks clean. This pull request has now been integrated. Changeset: 85e36d79 Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/85e36d79246913abb8b85c2be719670655d619ab Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8358236: [AOT] Graal crashes when trying to use persisted MDOs Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25570 From epeter at openjdk.org Mon Jun 2 03:09:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 03:09:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v70] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: dollar and hashtag parsing validatiaon ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ccc132b5..21d3f507 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=69 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=68-69 Stats: 31 lines in 2 files changed: 26 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon Jun 2 03:30:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 03:30:24 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 - validation tests - dollar and hashtag parsing validatiaon - wip refactor parsing dollar and hashtag - more fixes from Christian - more improvements - more suggestions applied - good practice - rename template arguments - more from Christian - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 ------------- Changes: https://git.openjdk.org/jdk/pull/24217/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=70 Stats: 6683 lines in 27 files changed: 6683 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon Jun 2 03:30:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 03:30:24 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 10:39:57 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! > > I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. These are issues we could continue the conversation, unless you are satisfied with my answers: https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737 https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391 This is now ready for another review pass ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2928567671 From amitkumar at openjdk.org Mon Jun 2 03:37:57 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 2 Jun 2025 03:37:57 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:32:30 GMT, Andrew Haley wrote: > What are all those `nopr`s for? Sorry that is old code; nops were inserted for the loop alignment; this is the newer stub code: - - - [BEGIN] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - StubRoutines::unsafe_setmemory [0x000003ffa84b63c0, 0x000003ffa84b644c] (140 bytes) -------------------------------------------------------------------------------- BFD: unknown S/390 disassembler option: s390 .long 0x00000000 0x000003ffa84b63c0: vlvgb %v0,%r4,0 0x000003ffa84b63c6: vrepb %v0,%v0,0 0x000003ffa84b63cc: aghi %r3,-32 0x000003ffa84b63d0: jl 0x000003ffa84b63ec 0x000003ffa84b63d4: vst %v0,0(%r2) 0x000003ffa84b63da: vst %v0,16(%r2) 0x000003ffa84b63e0: aghi %r2,32 0x000003ffa84b63e4: aghi %r3,-32 0x000003ffa84b63e8: jhe 0x000003ffa84b63d4 0x000003ffa84b63ec: tmll %r3,16 0x000003ffa84b63f0: je 0x000003ffa84b63fe 0x000003ffa84b63f4: vst %v0,0(%r2) 0x000003ffa84b63fa: aghi %r2,16 0x000003ffa84b63fe: tmll %r3,8 0x000003ffa84b6402: je 0x000003ffa84b6410 0x000003ffa84b6406: vsteg %v0,0(%r2),0 0x000003ffa84b640c: aghi %r2,8 0x000003ffa84b6410: tmll %r3,7 0x000003ffa84b6414: je 0x000003ffa84b644a 0x000003ffa84b6418: tmll %r3,4 0x000003ffa84b641c: je 0x000003ffa84b642a 0x000003ffa84b6420: vstef %v0,0(%r2),0 0x000003ffa84b6426: aghi %r2,4 0x000003ffa84b642a: tmll %r3,2 0x000003ffa84b642e: je 0x000003ffa84b643c 0x000003ffa84b6432: vsteh %v0,0(%r2),0 0x000003ffa84b6438: aghi %r2,2 0x000003ffa84b643c: tmll %r3,1 0x000003ffa84b6440: je 0x000003ffa84b644a 0x000003ffa84b6444: vsteb %v0,0(%r2),0 0x000003ffa84b644a: br %r14 -------------------------------------------------------------------------------- - - - [END] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2928591294 From epeter at openjdk.org Mon Jun 2 04:54:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 04:54:53 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v8] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 17:43:27 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > We can further constrain the value range bounds of bit compression and expansion once PR #17508 gets integrated. For now, I have developed the following draft demonstrates bound constraining with KnownBitLattice. > > > // > // Prototype of bit compress/expand value range computation > // using KnownBits infrastructure. > // > > #include > #include > #include > #include > > template > class KnownBitsLattice { > private: > U zeros; > U ones; > > public: > KnownBitsLattice(U lb, U ub); > > U getKnownZeros() { > return zeros; > } > > U getKnownOnes() { > return ones; > } > > long getKnownZerosCount() { > uint64_t count = 0; > asm volatile ("popcntq %1, %0 \n\t" : "=r"(count) : "r"(zeros) : "cc"); > return count; > } > > long getKnownOnesCount() { > uint64_t count = 0; > asm volatile ("popcntq %1, %0 \n\t" : "=r"(count) : "r"(ones) : "cc"); > return count; > } > > bool check_voilation() { > // A given bit cannot be both zero or one. > return (zeros & ones) != 0; > } > > bool is_MSB_KnownOneBitsSet() { > return (ones >> 63) == 1; > } > > bool is_MSB_KnownZeroBitsSet() { > return (zeros >> 63) == 1; > } > }; > > template > KnownBitsLattice::KnownBitsLattice(U lb, U ub) { > // To find KnownBitsLattice from a given value range > // we first find the common prefix b/w upper and lower > // bound, we then concertize known zeros and ones bit > // based on common prefix. > // e.g. > // lb = 00110001 > // ub = 00111111 > // common prefix = 0011XXXX > // knownbits.zeros = 11000000 > // knownbits.ones = 00110000 > // > // conversely, for a give knownbits value we can find > // lower and upper value ranges. > // e.g. > // knownbits.zeros = 0x00010001 > // knownbits.ones = 0x10001100 > // range.lo = knownbits.ones, this is because knownbits.ones are > // guaranteed to be one. > // range.hi = ~knownbits.zeros, this is an optimistic upper bound > // which assumes all unset knownbits.zero > // are ones. > // Thus in above example, > // range.lo = 0x8C > // range.hi = 0xEE > > U lzcnt = 0; > U common_prefix = lb ^ ub; > asm volatile ("lzcntq %1, %0 \n\t" : "=r"(lzcnt) : "r"(common_prefix) : "cc"); > U common_prefix_mask = lzcnt == 0 ? 0xFFFFFFFFFFFFFFFFL : ~((1ULL << (64 - lzcnt)) - 1); > zeros = (~lb) & common_prefix_mask; > ones = (lb) & c... @jatin-bhateja Nice! Yes I'm looking forward to reviewing all the KnownBits extensions! @jatin-bhateja Let me know whenever this is ready for another pass of reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2928741573 From rehn at openjdk.org Mon Jun 2 05:45:59 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 2 Jun 2025 05:45:59 GMT Subject: RFR: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:47:06 GMT, Robbin Ehn wrote: > Hi please consider. > > As ref: https://github.com/openjdk/jdk/pull/25483 > As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. > > Sanity tested, running t1. > > /Robbin Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25502#issuecomment-2928896307 From rehn at openjdk.org Mon Jun 2 05:45:59 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 2 Jun 2025 05:45:59 GMT Subject: Integrated: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:47:06 GMT, Robbin Ehn wrote: > Hi please consider. > > As ref: https://github.com/openjdk/jdk/pull/25483 > As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. > > Sanity tested, running t1. > > /Robbin This pull request has now been integrated. Changeset: c5a1543e Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e Stats: 27 lines in 1 file changed: 0 ins; 18 del; 9 mod 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent Reviewed-by: eosterlund, fbredberg, shade, fyang ------------- PR: https://git.openjdk.org/jdk/pull/25502 From mchevalier at openjdk.org Mon Jun 2 06:53:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 2 Jun 2025 06:53:50 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: On Sat, 31 May 2025 02:59:48 GMT, SendaoYan wrote: > Hi, how does this bug was found, seems the original testcase generated by a fuzz tool. Seems so, given what the initial reproducer looks like, but I'm not sure. The ticket was opened 3 years ago, not sure anyone remembers. If you want to know more context, maybe you can ask the initial reporter. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2929087749 From jbhateja at openjdk.org Mon Jun 2 07:44:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Jun 2025 07:44:58 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Hi @XiaohongGong , Looks good to me, thanks again for this re-factor !! Best Regards, Jatin ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25138#pullrequestreview-2887157235 From duke at openjdk.org Mon Jun 2 08:15:36 2025 From: duke at openjdk.org (Tom Shull) Date: Mon, 2 Jun 2025 08:15:36 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v2] In-Reply-To: References: Message-ID: > Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. > > To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. Tom Shull has updated the pull request incrementally with one additional commit since the last revision: format javadoc and update test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25498/files - new: https://git.openjdk.org/jdk/pull/25498/files/1f42f05f..0de1feae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=00-01 Stats: 16 lines in 4 files changed: 2 ins; 2 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25498/head:pull/25498 PR: https://git.openjdk.org/jdk/pull/25498 From shade at openjdk.org Mon Jun 2 08:18:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 08:18:53 GMT Subject: RFR: 8358169: Shenandoah/JVMCI: Export GC state constants In-Reply-To: References: Message-ID: On Fri, 30 May 2025 16:09:03 GMT, Roman Kennke wrote: > We need the GC state enum constants available in JVMCI. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25552#pullrequestreview-2887264825 From dbriemann at openjdk.org Mon Jun 2 08:27:56 2025 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 2 Jun 2025 08:27:56 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks [v2] In-Reply-To: References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: On Wed, 28 May 2025 19:12:55 GMT, Martin Doerr wrote: >> In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) >> We only need to use the correct NullPointerException entry in the compiler case. >> >> With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix bastore without ImplicitNullChecks. LGTM. Thanks ------------- Marked as reviewed by dbriemann (Author). PR Review: https://git.openjdk.org/jdk/pull/25504#pullrequestreview-2887294855 From mdoerr at openjdk.org Mon Jun 2 08:33:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 08:33:56 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks [v2] In-Reply-To: References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: On Wed, 28 May 2025 19:12:55 GMT, Martin Doerr wrote: >> In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) >> We only need to use the correct NullPointerException entry in the compiler case. >> >> With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix bastore without ImplicitNullChecks. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25504#issuecomment-2929418451 From mdoerr at openjdk.org Mon Jun 2 08:33:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 08:33:57 GMT Subject: Integrated: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks In-Reply-To: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: On Wed, 28 May 2025 17:00:48 GMT, Martin Doerr wrote: > In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) > We only need to use the correct NullPointerException entry in the compiler case. > > With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. This pull request has now been integrated. Changeset: ba9f44c9 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/ba9f44c90fe8da2d97d67b6878ac2c0c14e35bd0 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks Reviewed-by: shade, dbriemann ------------- PR: https://git.openjdk.org/jdk/pull/25504 From duke at openjdk.org Mon Jun 2 08:39:31 2025 From: duke at openjdk.org (Tom Shull) Date: Mon, 2 Jun 2025 08:39:31 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v2] In-Reply-To: References: Message-ID: > This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolve()` > 2. `JavaConstant lookup()` > > The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. Tom Shull has updated the pull request incrementally with one additional commit since the last revision: reviewer feedback and update javadoc formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25420/files - new: https://git.openjdk.org/jdk/pull/25420/files/519be178..60c39b5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=00-01 Stats: 23 lines in 2 files changed: 3 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25420/head:pull/25420 PR: https://git.openjdk.org/jdk/pull/25420 From shade at openjdk.org Mon Jun 2 08:42:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 08:42:55 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates [v2] In-Reply-To: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> References: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> Message-ID: <-gNhkdcFda-JXrWH4bpViukhPFnm0EyO71u1o2ZyV68=.0228af79-58ec-4fb5-9ca0-85148cc8365d@github.com> On Thu, 29 May 2025 23:04:25 GMT, Chad Rakoczy wrote: >> [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) >> >> The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) >> >> The reasoning for this change is the same as the x86 version's PR: >> >>> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >>> >>> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. >> >> Additional testing: >> >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25512#pullrequestreview-2887351889 From rkennke at openjdk.org Mon Jun 2 08:59:56 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 2 Jun 2025 08:59:56 GMT Subject: Integrated: 8358169: Shenandoah/JVMCI: Export GC state constants In-Reply-To: References: Message-ID: On Fri, 30 May 2025 16:09:03 GMT, Roman Kennke wrote: > We need the GC state enum constants available in JVMCI. This pull request has now been integrated. Changeset: eb9badd8 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/eb9badd8a4ea6dca834525fd49429e2ce771a76c Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8358169: Shenandoah/JVMCI: Export GC state constants Reviewed-by: dnsimon, shade ------------- PR: https://git.openjdk.org/jdk/pull/25552 From galder at openjdk.org Mon Jun 2 09:17:52 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 2 Jun 2025 09:17:52 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Fri, 30 May 2025 07:43:29 GMT, Xiaohong Gong wrote: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/25539#pullrequestreview-2887478434 From epeter at openjdk.org Mon Jun 2 10:30:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 10:30:51 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Fri, 30 May 2025 07:43:29 GMT, Xiaohong Gong wrote: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopCastIV.java line 2: > 1: /* > 2: * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Can you please move the test to `test/hotspot/jtreg/compiler/loopopts`? The `irTests` directory was not the best idea, it makes more sense to have tests thematically grouped. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2120715051 From mchevalier at openjdk.org Mon Jun 2 10:37:11 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 2 Jun 2025 10:37:11 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2] In-Reply-To: References: Message-ID: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25551/files - new: https://git.openjdk.org/jdk/pull/25551/files/fb8d64d9..8318b50c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25551&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25551&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25551.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25551/head:pull/25551 PR: https://git.openjdk.org/jdk/pull/25551 From mchevalier at openjdk.org Mon Jun 2 10:37:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 2 Jun 2025 10:37:12 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: <5KRLt28hn0r2ZL_M0Rdx7LOThZPIymChXhWGP7SVLXI=.0a0bc3f7-b81d-4271-8044-8431edd6196d@github.com> On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier wrote: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... I've changed the two `mov`s into a `fmovs` as suggested and adapted the format part. Tests seem happy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2929946930 From mchevalier at openjdk.org Mon Jun 2 10:37:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 2 Jun 2025 10:37:12 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 14:29:26 GMT, Andrew Haley wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions > > src/hotspot/cpu/aarch64/aarch64.ad line 7771: > >> 7769: ins_encode %{ >> 7770: __ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0 >> 7771: __ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src > > "Where the entire 128-bit wide register is not fully utilized, the vector or scalar quantity is held in the least significant bits of the register, with the most significant bits being cleared to zero on a write." > > Suggestion: > > __ fmovs($tmp$$FloatRegister, $src$$Register); > > should do it. Yes! Nicer, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25551#discussion_r2120723694 From mchevalier at openjdk.org Mon Jun 2 10:37:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 2 Jun 2025 10:37:12 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 03:11:28 GMT, SendaoYan wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions > > test/hotspot/jtreg/compiler/intrinsics/BitCountIAarch64PreservesArgument.java line 58: > >> 56: if (result != 0xfedc_ba98_7654_3210L) { >> 57: // Wrongly outputs the cut input 0x7654_3210 == 1985229328 >> 58: throw new RuntimeException("Wrong result. lFld=" + lFld + "; result=" + result); > > How about: > > > throw new RuntimeException("Wrong result. Expected result = " + lFld + "; Actual result = " + result); That looks better indeed. Applied. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25551#discussion_r2120724260 From epeter at openjdk.org Mon Jun 2 10:45:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 10:45:50 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: <1Er4nlGWx_yp6RIkqSo0PUk84lX50sTAGmGbnu4jokY=.74dc326e-9038-40d0-9b00-f5eaef1bd504@github.com> On Fri, 30 May 2025 07:43:29 GMT, Xiaohong Gong wrote: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 @XiaohongGong Nice work! @chhagedorn And I quickly discussed it offline, and we think this is a good approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2929984802 From epeter at openjdk.org Mon Jun 2 10:49:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 10:49:51 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Fri, 30 May 2025 07:43:29 GMT, Xiaohong Gong wrote: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopCastIV.java line 57: > 55: out[i] = 0; > 56: } > 57: } You could also just use `Arrays.fill` test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopCastIV.java line 174: > 172: > 173: public static void main(String[] args) { > 174: TestFramework.runWithFlags("-XX:LoopUnrollLimit=0"); What is the reason for the flag here? Do you really need it? test/micro/org/openjdk/bench/vm/compiler/CountedLoopCastIV.java line 54: > 52: Random r = new Random(); > 53: start = r.nextInt(LEN >> 2); > 54: limit = r.nextInt(LEN >> 1, LEN - 3); Does this not mean that we use a different seed every time, and therefore the loop has different lengths, and so the results can be influenced accordingly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2120762941 PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2120766290 PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2120770394 From epeter at openjdk.org Mon Jun 2 10:50:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 10:50:54 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:15:22 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> >>> Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> >>> https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> >>> I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> >> >>> > Yes, I also observed such regression. >>> > It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. >>> >>> For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. >> >> Sounds good to me. Thanks! > >> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >> > >> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! > > Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! @XiaohongGong I reviewed https://github.com/openjdk/jdk/pull/25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2930007655 From epeter at openjdk.org Mon Jun 2 10:53:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 10:53:52 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Fri, 30 May 2025 07:43:29 GMT, Xiaohong Gong wrote: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 @XiaohongGong I suggest you change the title from: `8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times` to `8357726: C2 recognize loops with multiple casts in trip counter` or even: `8357726: C2 recognize loops with multiple casts in trip counter: phi -> CastII* -> AddI -> phi` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2930020530 From epeter at openjdk.org Mon Jun 2 11:06:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 11:06:53 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Fri, 30 May 2025 07:26:13 GMT, Hannes Greule wrote: >> This change improves the precision of the `Mod(I|L)Node::Value()` functions. >> >> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. >> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. >> >> ### Monotonicity >> >> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). >> >> ### Testing >> >> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). >> >> Please review and let me know what you think. >> >> ### Other >> >> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. >> >> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: >> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? >> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > Add randomized test src/hotspot/share/opto/divnode.cpp line 1206: > 1204: > 1205: //------------------------------Value------------------------------------------ > 1206: static const Type* mod_value(const PhaseGVN* phase, const Node* in1, const Node* in2, const BasicType bt, const Type* bottom) { You did choose the `bt` path here! I would add an assert that we only allow `T_INT` and `T_LONG` src/hotspot/share/opto/divnode.cpp line 1237: > 1235: // We don't need to check for min_jint % '-1' as its result is defined when using jlong. > 1236: if (i1->get_con_as_long(bt) == min_jlong && i2->get_con_as_long(bt) == -1) { > 1237: return TypeInteger::zero(bt); Is this correct? For `bt = T_INT` is this really equivalent? `i1->get_con() == min_jint` We might get `min_jint` back here. `i1->get_con_as_long(bt) == min_jlong` Would we not return `min_jint` here, and then the condition is false? Do we have an IR test for this? src/hotspot/share/opto/divnode.cpp line 1241: > 1239: return TypeInteger::make(i1->get_con_as_long(bt) % i2->get_con_as_long(bt), bt); > 1240: } > 1241: // The magnitude of the divisor is in range [1, 2^63]. You should probably also mention the `2^31` variant. src/hotspot/share/opto/divnode.cpp line 1247: > 1245: // JVMS lrem bytecode: "the magnitude of the result is always less than the magnitude of the divisor" > 1246: // "less than" means we can subtract 1 to get an inclusive upper bound in [0, 2^63-1] > 1247: jlong hi = static_cast(divisor_magnitude - 1); Hmm, this also looks confusing for the `T_INT` case. What about `-5`, does that then not become `max_julong - 5`, but it should have been `max_juint - 1`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120802575 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120800945 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120801900 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120805584 From epeter at openjdk.org Mon Jun 2 11:07:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 11:07:53 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:45:49 GMT, Zdenek Zambersky wrote: > (I have not changed JIRA as there is no info about fix. Should I add it there?) Yes please, that is generally what we should do :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2930075745 From epeter at openjdk.org Mon Jun 2 11:10:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 11:10:53 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:39:27 GMT, Zdenek Zambersky wrote: >> This change adds ` -XX:-IgnoreUnrecognizedVMOptions` to problematic tests (or `@requires vm.compiler2.enabled` in one case), to prevent failures `Unrecognized VM option` on client VM. > > Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix of compiler tests for client VM Still looks reasonable. I'll run some testing now, please ping me again in 24h :) ------------- PR Review: https://git.openjdk.org/jdk/pull/24262#pullrequestreview-2887893389 From dnsimon at openjdk.org Mon Jun 2 11:11:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 11:11:52 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 08:15:36 GMT, Tom Shull wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > Tom Shull has updated the pull request incrementally with one additional commit since the last revision: > > format javadoc and update test Looks good to me. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25498#pullrequestreview-2887897833 From dnsimon at openjdk.org Mon Jun 2 11:16:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 11:16:53 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v2] In-Reply-To: References: Message-ID: <1jDUbEJHRDYuT4RDOHlEeY5C4IWwwcenweFgZcwnUsU=.bc8d84ad-13bb-4b5a-9d02-de020301e3d6@github.com> On Mon, 2 Jun 2025 08:39:31 GMT, Tom Shull wrote: >> This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. >> >> In addition, two methods are added to the BootstrapMethodInvocations: >> 1. `void resolve()` >> 2. `JavaConstant lookup()` >> >> The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. > > Tom Shull has updated the pull request incrementally with one additional commit since the last revision: > > reviewer feedback and update javadoc formatting Looks good to me. Please enable GitHub Actions on your JDK fork. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25420#pullrequestreview-2887908832 From hgreule at openjdk.org Mon Jun 2 11:33:53 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 2 Jun 2025 11:33:53 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 2 Jun 2025 10:58:45 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> Add randomized test > > src/hotspot/share/opto/divnode.cpp line 1237: > >> 1235: // We don't need to check for min_jint % '-1' as its result is defined when using jlong. >> 1236: if (i1->get_con_as_long(bt) == min_jlong && i2->get_con_as_long(bt) == -1) { >> 1237: return TypeInteger::zero(bt); > > Is this correct? For `bt = T_INT` is this really equivalent? > > `i1->get_con() == min_jint` > We might get `min_jint` back here. > > `i1->get_con_as_long(bt) == min_jlong` > Would we not return `min_jint` here, and then the condition is false? > > Do we have an IR test for this? This special case is only needed because `min_jlong % -1L` in C++ is UB (afaik) and the idiv instruction triggers a SIGFPE in such case. But `min_jint % -1L` *using long arithmetic* correctly produces 0. I think it would make sense to expand tests for constant folding, but I'll have to check if that actually gets called, see **Other** in the PR description (copied): > If the divisor is a constant, we will directly replace the Mod(I|L)Node with more but less expensive nodes in ::Ideal(). Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? So there's a chance this code was never called before... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120868235 From hgreule at openjdk.org Mon Jun 2 11:36:54 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 2 Jun 2025 11:36:54 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: <5FnA_gZNzRom3MBShwfbdCffeRGogf1cyKo0nF40c4I=.9db6f973-e6a5-4852-b82e-24ccc198bcb9@github.com> On Mon, 2 Jun 2025 11:01:29 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> Add randomized test > > src/hotspot/share/opto/divnode.cpp line 1247: > >> 1245: // JVMS lrem bytecode: "the magnitude of the result is always less than the magnitude of the divisor" >> 1246: // "less than" means we can subtract 1 to get an inclusive upper bound in [0, 2^63-1] >> 1247: jlong hi = static_cast(divisor_magnitude - 1); > > Hmm, this also looks confusing for the `T_INT` case. What about `-5`, does that then not become `max_julong - 5`, but it should have been `max_juint - 1`? We use `g_uabs()` to get the absolute value, that should't exceed 2^31 for int values (i.e., `g_uabs(min_jint) == 2^31`). So we should get into the right range here again. But I guess I can expand the comment to better explain that part. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120875453 From epeter at openjdk.org Mon Jun 2 11:46:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 11:46:54 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: <5FnA_gZNzRom3MBShwfbdCffeRGogf1cyKo0nF40c4I=.9db6f973-e6a5-4852-b82e-24ccc198bcb9@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <5FnA_gZNzRom3MBShwfbdCffeRGogf1cyKo0nF40c4I=.9db6f973-e6a5-4852-b82e-24ccc198bcb9@github.com> Message-ID: On Mon, 2 Jun 2025 11:34:22 GMT, Hannes Greule wrote: >> src/hotspot/share/opto/divnode.cpp line 1247: >> >>> 1245: // JVMS lrem bytecode: "the magnitude of the result is always less than the magnitude of the divisor" >>> 1246: // "less than" means we can subtract 1 to get an inclusive upper bound in [0, 2^63-1] >>> 1247: jlong hi = static_cast(divisor_magnitude - 1); >> >> Hmm, this also looks confusing for the `T_INT` case. What about `-5`, does that then not become `max_julong - 5`, but it should have been `max_juint - 1`? > > We use `g_uabs()` to get the absolute value, that should't exceed 2^31 for int values (i.e., `g_uabs(min_jint) == 2^31`). So we should get into the right range here again. But I guess I can expand the comment to better explain that part. @SirYwell I'm not 100% sure here, so please correct me if I'm wrong. You are now always passing in a `jlong` value, so you always use `static inline julong g_uabs(jlong n) { return g_uabs((julong)n); }`, even for `T_INT`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2120898799 From epeter at openjdk.org Mon Jun 2 11:53:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 11:53:00 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 14:47:02 GMT, Kangcheng Xu wrote: >> @tabjy Thanks for your patience, this one took me longer than I wanted. I responded like this above: >> >>> Hmm, ok I see. Why don't you remove the asserts for now, and we see how clear the code looks now. I think I asked for the consistency check because I was confused by the previous code structure. Maybe it is ok now as it is. > > Ping @eme64 again for awareness. :) @tabjy > I could, at very least, try to swap LHS and RHS if no match is found I think that would be a good idea, and not very hard. You can just have a function `add_pattern(lhs, rhs)`, and then run it also with `add_pattern(rhs, lhs)` for **swapping**. Personally, I would have preferred a recursive algorithm, but that could have some compile time overhead. @chhagedorn Was a little more skeptical about the recursive algorithm. It seems the motivation for this change is the benchmark from here: ArithmeticCanonicalizationBenchmark https://ionutbalosin.com/2024/02/jvm-performance-comparison-for-jdk-21/#jit-compiler This benchmark is of course somewhat arbitrary, and so are now all of your added patterns. Having a most general solution would be nice, but maybe the recursive algorithm is too much, I'm not 100% sure. Of course we now still have cases that do not optimize/canonicalize, and so someone could write a benchmark for those cases still.. oh well. What I would like to see for **testing**: add some more patterns with IR rules. More that now optimize, and also a few that do not optimize, just so we have a bit of a sense what we are still missing. @rwestrel Filed this issue. I wonder: what do you think we should do here? How general should the optimization/canonicalization be? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2930295143 From aph at openjdk.org Mon Jun 2 11:56:52 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 2 Jun 2025 11:56:52 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 10:37:11 GMT, Marc Chevalier wrote: >> ### Problem >> >> On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: >> >> ; Load lFld into local x >> ldr x11, [x10, #120] >> ; popCountI >> mov w11, w11 >> mov v16.d[0], x11 >> cnt v16.8b, v16.8b >> addv b16, v16.8b >> mov x13, v16.d[0] >> ; [...] >> ; store local x (which is believed to still contain lFld) into result >> str x11, [x10, #128] >> >> >> The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: >> >> instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ >> match(Set dst (PopCountI src)); >> effect(TEMP tmp); >> [...] >> %} >> >> >> But then, why resetting the upper word of `x11`? It all starts with vector instructions: >> >> cnt v16.8b, v16.8b >> addv b16, v16.8b >> >> The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing >> >> mov v16.s[0], w11 >> >> would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which >> >> mov w11, w11 >> mov v16.d[0], x11 >> >> does, but by destroying `x11`. >> >> ### Solution >> >> Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. >> >> The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: >> >> mov v16.s[1], wzr ... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25551#pullrequestreview-2888056446 From chagedorn at openjdk.org Mon Jun 2 12:08:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 2 Jun 2025 12:08:10 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 05:58:13 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 306: >> >>> 304: var myHook = new Hook("MyHook"); >>> 305: >>> 306: var template1 = Template.make("name", "value", (String name, Integer value) -> body( >> >> One could generally think about using `_` for unused lambda parameters which I think is the common convention. But then I guess we would need to update the documentation about saying "name" and "String name" should be the same and make an exception for unused ones. I don't know. > > I think it is better to keep the names duplicated. This gives the reader an easier visual aid to check which name has which type. What do you think? That's totally fine and easy to follow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2120948254 From jbhateja at openjdk.org Mon Jun 2 12:08:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Jun 2025 12:08:54 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v4] In-Reply-To: <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> Message-ID: On Wed, 28 May 2025 09:15:31 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 >> - Enabling some test points >> - Adding test points and some re-factoring >> - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8352635 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 >> - 8352635: Improve inferencing of Float16 operations with constant inputs > > @jatin-bhateja That looks very promising, thanks for working on that! Hi @eme64 , Your comments have been addressed. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2930355506 From yzheng at openjdk.org Mon Jun 2 12:11:57 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 2 Jun 2025 12:11:57 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v2] In-Reply-To: References: Message-ID: <0o43MdXkVHVU8JQIoBSQ-46j3jLJjvAEqARhk88aeEw=.a202168b-af3f-4ce4-b274-f1cbbd4295fa@github.com> On Mon, 2 Jun 2025 08:15:36 GMT, Tom Shull wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > Tom Shull has updated the pull request incrementally with one additional commit since the last revision: > > format javadoc and update test src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 1079: > 1077: return List.of(); > 1078: } > 1079: return Collections.unmodifiableList(Arrays.asList(instanceMethods)); `return List.of(instanceMethods);` should work. We can then replace the above with `return List.of(runtime().compilerToVm.getAllMethods(this));` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2120952854 From chagedorn at openjdk.org Mon Jun 2 12:12:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 2 Jun 2025 12:12:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:56:18 GMT, Emanuel Peter wrote: >> Another question which is not evidently clear by following the examples: Can and should (not) you use the same hook inside the hook itself, i.e.: >> >> Hooks.CLASS_HOOK.anchor( >> Hooks.CLASS_HOOK.anchor( >> // ... >> >> This is probably not done on purpose but such a situation could arise when nesting more templates and suddenly one anchors the same hook again? > > I extended the explanations: > > ~ 397 // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK. > ~ 398 // By convention, we use the CLASS_HOOK for class scopes, and METHOD_HOOK for method scopes. > + 399 // Whenever we open a class scope, we should anchor a CLASS_HOOK for that scope, and whenever we > + 400 // open a method, we should anchor a METHOD_HOOK. Conversely, this allows us to check if we are > + 401 // inside a class or method scope by querying "isAnchored". This convention helps us when building > + 402 // a large library of Templates. But if you are writing your own self-contained set of Templates, > + 403 // you do not have to follow this convention. > + 404 // > + 405 // Hooks are "re-entrant", that is we can anchor the same hook inside a scope that we already > + 406 // anchored it previously. The "Hook.insert" always goes to the innermost anchoring of that > + 407 // hook. There are cases where "re-entrant" Hooks are helpful such as nested classes, where > + 408 // there is a class scope inside another class scope. Similarly, we can nest lambda bodies > + 409 // inside method bodies, so also METHOD_HOOK can be used in such a "re-entrant" way. > > > We could consider having both "re-entrant" and "non-re-entrant" Hooks. But I'm not yet convinced it is a very useful feature. Sure, there could be some confusion with nested hooks. But I think that confusion to code generation, because we can also nest class and method/lambda scopes. > > What do you think? The updated explanation is very good of making clear when we could/want to have nested hooks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2120955442 From chagedorn at openjdk.org Mon Jun 2 12:18:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 2 Jun 2025 12:18:10 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 10:39:57 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! > > I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) > @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. > > These are issues we could continue the conversation, unless you are satisfied with my answers: [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737) [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391) > > This is now ready for another review pass ? Awesome, thanks for spending some more time with these nasty edge-cases and finding a solution! I had a look at your updates for all my comments, they look good, thanks! I'm going to make a pass over the implementation classes now and will have a look at the `Renderer` updates as well :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2930394221 From thartmann at openjdk.org Mon Jun 2 12:49:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 2 Jun 2025 12:49:52 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 10:37:11 GMT, Marc Chevalier wrote: >> ### Problem >> >> On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: >> >> ; Load lFld into local x >> ldr x11, [x10, #120] >> ; popCountI >> mov w11, w11 >> mov v16.d[0], x11 >> cnt v16.8b, v16.8b >> addv b16, v16.8b >> mov x13, v16.d[0] >> ; [...] >> ; store local x (which is believed to still contain lFld) into result >> str x11, [x10, #128] >> >> >> The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: >> >> instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ >> match(Set dst (PopCountI src)); >> effect(TEMP tmp); >> [...] >> %} >> >> >> But then, why resetting the upper word of `x11`? It all starts with vector instructions: >> >> cnt v16.8b, v16.8b >> addv b16, v16.8b >> >> The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing >> >> mov v16.s[0], w11 >> >> would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which >> >> mov w11, w11 >> mov v16.d[0], x11 >> >> does, but by destroying `x11`. >> >> ### Solution >> >> Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. >> >> The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: >> >> mov v16.s[1], wzr ... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions Nice analysis, Marc! The fix looks good to me and I don't have a strong opinion about the print format. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25551#pullrequestreview-2888252754 From hgreule at openjdk.org Mon Jun 2 12:55:50 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 2 Jun 2025 12:55:50 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <5FnA_gZNzRom3MBShwfbdCffeRGogf1cyKo0nF40c4I=.9db6f973-e6a5-4852-b82e-24ccc198bcb9@github.com> Message-ID: On Mon, 2 Jun 2025 11:44:36 GMT, Emanuel Peter wrote: >> We use `g_uabs()` to get the absolute value, that should't exceed 2^31 for int values (i.e., `g_uabs(min_jint) == 2^31`). So we should get into the right range here again. But I guess I can expand the comment to better explain that part. > > @SirYwell I'm not 100% sure here, so please correct me if I'm wrong. > You are now always passing in a `jlong` value, so you always use `static inline julong g_uabs(jlong n) { return g_uabs((julong)n); }`, even for `T_INT`. Yes that's correct, and it should still work due to how negation works for negative inputs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2121069875 From mhaessig at openjdk.org Mon Jun 2 12:58:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 2 Jun 2025 12:58:53 GMT Subject: RFR: 8354930: IGV: dump C2 graph before and after live range stretching In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:54:24 GMT, Manuel H?ssig wrote: > This PR introduces a new phase `LIVE_RANGE_STRETCHING` that prints after live ranges have been stretched, if that happens at all. The phase `INITIAL_LIVENESS` is moved before live range stretching so we can compare the live ranges before and after stretching in IGV, which is useful for debugging why an oop suddenly belongs to an oop map. > > ## Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15299362485) > - [x] tier1 and tier1, plus additional Oracle internal testing for all Oracle supported platforms and OSs > - [x] verified that the new phase prints when it should in IGV and with `-XX:PrintPhaseLevel=4` Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25492#issuecomment-2930572102 From duke at openjdk.org Mon Jun 2 12:58:53 2025 From: duke at openjdk.org (duke) Date: Mon, 2 Jun 2025 12:58:53 GMT Subject: RFR: 8354930: IGV: dump C2 graph before and after live range stretching In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:54:24 GMT, Manuel H?ssig wrote: > This PR introduces a new phase `LIVE_RANGE_STRETCHING` that prints after live ranges have been stretched, if that happens at all. The phase `INITIAL_LIVENESS` is moved before live range stretching so we can compare the live ranges before and after stretching in IGV, which is useful for debugging why an oop suddenly belongs to an oop map. > > ## Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15299362485) > - [x] tier1 and tier1, plus additional Oracle internal testing for all Oracle supported platforms and OSs > - [x] verified that the new phase prints when it should in IGV and with `-XX:PrintPhaseLevel=4` @mhaessig Your change (at version df3c396f5a26658f6efbaf4f7a153f7214be5e57) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25492#issuecomment-2930573797 From chagedorn at openjdk.org Mon Jun 2 13:58:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 2 Jun 2025 13:58:22 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 03:30:24 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: > > - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 > - validation tests > - dollar and hashtag parsing validatiaon > - wip refactor parsing dollar and hashtag > - more fixes from Christian > - more improvements > - more suggestions applied > - good practice > - rename template arguments > - more from Christian > - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 I worked my way through the rest of the implementation. Impressive work Emanuel! I left some more mostly minor comments. But otherwise, this looks great! test/hotspot/jtreg/compiler/lib/template_framework/Code.java line 26: > 24: package compiler.lib.template_framework; > 25: > 26: import java.util.ArrayList; Unused: Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/Code.java line 33: > 31: * All the {@link String}s are later collected in a {@link StringBuilder}. If we used a {@link StringBuilder} > 32: * directly to collect the {@link String}s, we could not as easily insert code at an "earlier" position, i.e. > 33: * reaching out to a {@link Hook#set}. Suggestion: * reaching out to a {@link Hook#anchor}. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 37: > 35: * When a {@link Hook} is {@link Hook#set}, this separates the Template into an outer and inner > 36: * {@link CodeFrame}, ensuring that names that are {@link Template#addName}'d inside the inner frame > 37: * are only available inside that frame. Still references old method names. Suggestion: Suggestion: * The {@link CodeFrame} represents a frame (i.e. scope) of code, appending {@link Code} to the {@code 'codeList'} * as {@link Token}s are rendered, and adding names to the {@link NameSet}s with {@link Template#addStructuralName}/ * {@link Template#addDataName}. {@link Hook}s can be added to a frame, which allows code to be inserted at that * location later. When a {@link Hook} is {@link Hook#anchor}ed, it separates the Template into an outer and inner * {@link CodeFrame}, ensuring that names that are added inside the inner frame are only available inside that frame. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 52: > 50: class CodeFrame { > 51: public final CodeFrame parent; > 52: private final List codeList = new ArrayList(); Suggestion: private final List codeList = new ArrayList<>(); test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 58: > 56: * The {@link NameSet} is used for variable and fields etc. > 57: */ > 58: final NameSet names; I think this can also be made private: Suggestion: private final NameSet names; test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 70: > 68: } else { > 69: // New NameSet, to make sure we have a nested scope for the names. > 70: this.names = new NameSet(parent.names); Indentation is off: Suggestion: this.names = parent.names; } else { // New NameSet, to make sure we have a nested scope for the names. this.names = new NameSet(parent.names); test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 92: > 90: /** > 91: * Creates a special frame, which has a {@link #parent} but uses the {@link NameSet} > 92: * from the parent frame, allowing {@link Template#defineName} to persist in the outer `defineName` -> `addName`? test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 96: > 94: * where we would possibly want to make field or variable definitions during the insertion > 95: * that are not just local to the insertion but affect the {@link CodeFrame} that we > 96: * {@link Hook#set} earlier and are now {@link Hook#insert}ing into. Suggestion: * {@link Hook#anchor} earlier and are now {@link Hook#insert}ing into. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 118: > 116: } > 117: > 118: boolean hasHook(Hook hook) { Can be made private: Suggestion: private boolean hasHook(Hook hook) { test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 33: > 31: * count, list or even sample random {@link DataName}s. Every {@link DataName} has a {@link DataName.Type}, > 32: * so that sampling can be restricted to these types. > 33: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 123: > 121: if (mutability == Mutability.IMMUTABLE && dn.mutable()) { return false; } > 122: if (subtype != null && !dn.type().isSubtypeOf(subtype)) { return false; } > 123: if (supertype != null && !supertype.isSubtypeOf(dn.type())) { return false; } I suggest to use the full term: Suggestion: if (!(name instanceof DataName dataName)) { return false; } if (mutability == Mutability.MUTABLE && !dataName.mutable()) { return false; } if (mutability == Mutability.IMMUTABLE && dataName.mutable()) { return false; } if (subtype != null && !dataName.type().isSubtypeOf(subtype)) { return false; } if (supertype != null && !supertype.isSubtypeOf(dataName.type())) { return false; } test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 134: > 132: * @return The filtered {@link View}. > 133: * @throws UnsupportedOperationException If this {@link View} was already filtered with > 134: * {@link subtypeOf} or {@link exactOf}. Also for links at methods below: Suggestion: * {@link #subtypeOf} or {@link #exactOf}. test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 144: > 142: > 143: /** > 144: * Create a filtered {@link View}, where all {@link DataName}s must be subtypes of {@code type}. Suggestion: * Create a filtered {@link View}, where all {@link DataName}s must be supertypes of {@code type}. test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 181: > 179: */ > 180: public DataName sample() { > 181: DataName n = (DataName)Renderer.getCurrent().sampleName(predicate()); Do you really need this cast? Can't you just return a `Name`. From the uses it seems that you only call interface methods from `Name` at the use-sites. test/hotspot/jtreg/compiler/lib/template_framework/Hook.java line 34: > 32: * "back" or to some outer scope, e.g. while generating code for a method, one can reach out > 33: * to the class scope to insert fields. > 34: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/Name.java line 35: > 33: * The name of the name, that can be used in code. > 34: * > 35: * @return The {@String} name of the name, that can be used in code. Suggestion: * @return The {@link String} name of the name, that can be used in code. test/hotspot/jtreg/compiler/lib/template_framework/Name.java line 54: > 52: int weight(); > 53: > 54: public interface Type { Implicitly public: Suggestion: interface Type { test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 38: > 36: */ > 37: class NameSet { > 38: static final Random RANDOM = Utils.getRandomInstance(); Suggestion: private static final Random RANDOM = Utils.getRandomInstance(); test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 58: > 56: > 57: private long weight(Predicate predicate) { > 58: long w = names.stream().filter(n -> predicate.check(n)).mapToInt(Name::weight).sum(); Suggestion: long w = names.stream().filter(predicate::check).mapToInt(Name::weight).sum(); test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 64: > 62: > 63: public int count(Predicate predicate) { > 64: int c = (int)names.stream().filter(n -> predicate.check(n)).count(); Suggestion: int c = (int)names.stream().filter(predicate::check).count(); test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 70: > 68: > 69: public boolean hasAny(Predicate predicate) { > 70: return names.stream().anyMatch(n -> predicate.check(n)) || Suggestion: return names.stream().anyMatch(predicate::check) || test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 77: > 75: List list = (parent != null) ? parent.toList(predicate) > 76: : new ArrayList<>(); > 77: list.addAll(names.stream().filter(n -> predicate.check(n)).toList()); Suggestion: list.addAll(names.stream().filter(predicate::check).toList()); test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 88: > 86: if (w <= 0) { > 87: return null; > 88: } Shouldn't the weight always be positive? test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 66: > 64: // another non-capturing group. > 65: "(?:\\{" + > 66: // capturing group for "name" inside of "{name}" Suggestion: // capturing group for "name" inside "{name}" test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 199: > 197: /** > 198: * Formats values to {@link String} with the goal of using them in Java code. > 199: * By default we use the overrides of {@link Object#toString}. Suggestion: * By default, we use the overrides of {@link Object#toString}. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 266: > 264: case StringToken(String s) -> { > 265: renderStringWithDollarAndHashtagReplacements(s); > 266: } Suggestion: case StringToken(String s) -> renderStringWithDollarAndHashtagReplacements(s); test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 321: > 319: callerCodeFrame.addCode(currentCodeFrame.getCode()); > 320: currentCodeFrame = callerCodeFrame; > 321: } For readability: Suggestion: case HookInsertToken(Hook hook, TemplateToken templateToken) -> { // Switch to hook CodeFrame. CodeFrame callerCodeFrame = currentCodeFrame; CodeFrame hookCodeFrame = codeFrameForHook(hook); // Use a transparent nested CodeFrame. We need a CodeFrame so that the code generated // by the TemplateToken can be collected, and hook insertions from it can still // be made to the hookCodeFrame before the code from the TemplateToken is added to // the hookCodeFrame. // But the CodeFrame must be transparent, so that its name definitions go out to // the hookCodeFrame, and are not limited to the CodeFrame for the TemplateToken. currentCodeFrame = CodeFrame.makeTransparentForNames(hookCodeFrame); renderTemplateToken(templateToken); hookCodeFrame.addCode(currentCodeFrame.getCode()); // Switch back from hook CodeFrame to caller CodeFrame. currentCodeFrame = callerCodeFrame; } case TemplateToken templateToken -> { // Use a nested CodeFrame. CodeFrame callerCodeFrame = currentCodeFrame; currentCodeFrame = CodeFrame.make(currentCodeFrame); renderTemplateToken(templateToken); callerCodeFrame.addCode(currentCodeFrame.getCode()); currentCodeFrame = callerCodeFrame; } test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 324: > 322: case AddNameToken(Name name) -> { > 323: currentCodeFrame.addName(name); > 324: } Suggestion: case AddNameToken(Name name) -> currentCodeFrame.addName(name); test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 338: > 336: } > 337: > 338: private void renderStringWithDollarAndHashtagReplacements(String s) { Hard to grasp the logic of that method. But I trust you on that :-) I leave it up to you if you want to improve readability to extract some of the logic to separate methods such that this method becomes easier to understand. test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 33: > 31: * count, list or even sample random {@link StructuralName}s. Every {@link StructuralName} has a {@link StructuralName.Type}, > 32: * so that sampling can be restricted to these types. > 33: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 47: > 45: */ > 46: public StructuralName { > 47: } Is this required? Is it not automatically added? Same for `DataName`. test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 68: > 66: */ > 67: boolean isSubtypeOf(StructuralName.Type other); > 68: } This is identical to `DataName.Type`. What is the benefit of having separate interfaces `DataName.Type` and `StructuralName.Type`? Couldn't we just move `isSubtypeOf()` directly to the `Name.Type` interface and use that one below and for the fields and expose that one instead to the user? This would mean that you can update all `DataName/StructuralName.Type` to `Name.Type`. I have not checked if this is fully possible but it just occurred to me when reviewing this duplicated interface now. test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 96: > 94: if (!(name instanceof StructuralName dn)) { return false; } > 95: if (subtype != null && !dn.type().isSubtypeOf(subtype)) { return false; } > 96: if (supertype != null && !supertype.isSubtypeOf(dn.type())) { return false; } Suggestion: if (!(name instanceof StructuralName structuralName)) { return false; } if (subtype != null && !structuralName.type().isSubtypeOf(subtype)) { return false; } if (supertype != null && !supertype.isSubtypeOf(structuralName.type())) { return false; } test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 107: > 105: * @return The filtered {@link View}. > 106: * @throws UnsupportedOperationException If this {@link View} was already filtered with > 107: * {@link subtypeOf} or {@link exactOf}. Same here and in methods below: Suggestion: * {@link #subtypeOf} or {@link #exactOf}. test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 117: > 115: > 116: /** > 117: * Create a filtered {@link View}, where all {@link StructuralName}s must be subtypes of {@code type}. Suggestion: * Create a filtered {@link View}, where all {@link StructuralName}s must be supertypes of {@code type}. test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java line 43: > 41: * Creates a new {@link TemplateBinding} that has no Template bound to it yet. > 42: */ > 43: public TemplateBinding() {} Can also be removed since it's the default constructor that is automatically added for you. Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 31: > 29: > 30: /** > 31: * The {@link Template#body} and {@link Hook#set} are given a list of tokens, which are either Suggestion: * The {@link Template#body} and {@link Hook#anchor} are given a list of tokens, which are either test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 74: > 72: case Float s -> outputList.add(new StringToken(Renderer.format(s))); > 73: case Boolean s -> outputList.add(new StringToken(Renderer.format(s))); > 74: case List l -> parseList(l, outputList); Not sure if we should use a raw `List` here. Would `List` work as well? Would then need to update `parseList(List inputList ...)` to `List` as well. test/hotspot/jtreg/compiler/lib/template_framework/library/Hooks.java line 32: > 30: */ > 31: public abstract class Hooks { > 32: private Hooks() {} // Avoid instantiation and need for documentation. With `abstract` you cannot call the constructor. But you could make `Hooks` final instead of abstract and keep the private constructor. ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2888138689 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121190567 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121189211 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121045420 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121041625 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121043407 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121047490 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121166922 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121167248 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121170840 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121066303 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121094470 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121065215 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121100321 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121117160 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121184120 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121002671 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121005423 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121050914 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121052214 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121054013 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121054604 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121054802 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121074504 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121194420 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121221834 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121206168 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121217757 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121220490 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121228275 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121119033 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121122026 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121143144 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121124204 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121124996 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121125964 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2120995640 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2120976577 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2120983233 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2120989645 From epeter at openjdk.org Mon Jun 2 14:08:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 14:08:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 12:27:21 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java line 43: > >> 41: * Creates a new {@link TemplateBinding} that has no Template bound to it yet. >> 42: */ >> 43: public TemplateBinding() {} > > Can also be removed since it's the default constructor that is automatically added for you. > Suggestion: If I do that, then `javadoc` complains: test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java:37: warning: use of default constructor, which does not provide a comment public class TemplateBinding { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121254922 From mli at openjdk.org Mon Jun 2 14:13:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 2 Jun 2025 14:13:51 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Wed, 28 May 2025 09:20:03 GMT, Emanuel Peter wrote: > @Hamlin-Li Thanks for working on this! @eme64 Sorry for the delayed reply, I've been on vacation. Thank you for having a look! > Can you please provide the the JMH benchmark results for your measurements? Sure, I have the data in https://github.com/openjdk/jdk/pull/25341, I can copy the data here. But it won't impact jmh result until https://github.com/openjdk/jdk/pull/25341 is pushed in. I'll add more jmh test and data for integral types. > It would also be good to have some IR tests, that cover the newly vectorized cases. You're right, will add more IR tests. > src/hotspot/cpu/riscv/matcher_riscv.hpp line 204: > >> 202: static bool supports_vectorize_cmove_bool_unconditionally() { >> 203: return true; >> 204: } > > Does RISCV support the use of any input vector element type, including 8bit, 16bit, 32bit and 64bit masks, and any elements we would be blending, incl `byte, short, char, int, long, HF, F, D`? > > Because it sounds you are promissing this really "unconditionally". Or what exactly do you mean by "unconditionally"? ( In this pr, it should return false for riscv too and be enabled in the riscv pr. I'll modify it. ) > Does RISCV support the use of any input vector element type, including 8bit, 16bit, 32bit and 64bit masks, and any elements we would be blending, incl byte, short, char, int, long, HF, F, D? Good question! I'll add some additional tests to double check and reflect this. I think the answer should be yes, i.e. on riscv all size of source inputs (comparing operands) and all size of dest outputs (blending result) are supported. But for HF, it's a bit special, the underlying payload is a short, so in theory it should be supported too, but it's not supported in this pr and the related riscv pr (https://github.com/openjdk/jdk/pull/25341). > Because it sounds you are promissing this really "unconditionally". Or what exactly do you mean by "unconditionally"? I mean it's really "unconditionally", but if you feel it's better to add an argument, like `supports_vectorize_cmove_bool_unconditionally(BasicType src, BasicType dst)`, I can do it. And I need to modify the `vectornode.cpp` as below too, I'll check it and modify this pr. ``` case Op_CMoveI: return (is_integral_type(bt) && bt != T_LONG ? Op_VectorBlend : 0); > src/hotspot/share/opto/superword.cpp line 2363: > >> 2361: VectorNode::is_vectorize_cmove_bool_unconditionally_supported()) { >> 2362: return true; >> 2363: } > > Can you please list which additional cases this now allows? > I suppose `D/F` comparison for the `Bool`, and then `D/F` inputs for `CMove`, but we can mismatch, e.g. compare `F` but blend `D`, right? Sure, I'll add this list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25336#issuecomment-2930920880 PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2121272954 PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2121273227 From epeter at openjdk.org Mon Jun 2 14:13:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 14:13:56 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 12:54:35 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 88: > >> 86: if (w <= 0) { >> 87: return null; >> 88: } > > Shouldn't the weight always be positive? Yes. True. I sometimes just also cover negative values to be a bit more robust... but I can also change it if you prefer that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121271395 From epeter at openjdk.org Mon Jun 2 14:13:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 14:13:52 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v72] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/cb7037e7..d8f66250 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=71 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=70-71 Stats: 19 lines in 4 files changed: 0 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon Jun 2 14:21:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 14:21:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v73] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/d8f66250..30059e66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=72 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=71-72 Stats: 23 lines in 5 files changed: 3 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon Jun 2 14:21:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 14:21:32 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: <5HBkwD7E-Kr7zr4jRiRrO_uFxE4gWwJYqw1XcKsFCPY=.756231da-cc4b-4db8-85d7-9db17894810e@github.com> On Mon, 2 Jun 2025 13:45:07 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 266: > >> 264: case StringToken(String s) -> { >> 265: renderStringWithDollarAndHashtagReplacements(s); >> 266: } > > Suggestion: > > case StringToken(String s) -> renderStringWithDollarAndHashtagReplacements(s); I think I prefer the uniformity of the brackets as I have it. Would that be ok for you too? > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 324: > >> 322: case AddNameToken(Name name) -> { >> 323: currentCodeFrame.addName(name); >> 324: } > > Suggestion: > > case AddNameToken(Name name) -> currentCodeFrame.addName(name); Like above: I like the uniformity of the brackets here. Is that ok for you to keep as is? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121288248 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121290027 From epeter at openjdk.org Mon Jun 2 14:24:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 14:24:16 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 12:21:42 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 74: > >> 72: case Float s -> outputList.add(new StringToken(Renderer.format(s))); >> 73: case Boolean s -> outputList.add(new StringToken(Renderer.format(s))); >> 74: case List l -> parseList(l, outputList); > > Not sure if we should use a raw `List` here. Would `List` work as well? Would then need to update `parseList(List inputList ...)` to `List` as well. What exactly do you think is the problem here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121300491 From shade at openjdk.org Mon Jun 2 15:00:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 15:00:05 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v3] In-Reply-To: References: Message-ID: > See bug for more discussion. > > This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8357473-compile-task-free-list - Merge branch 'master' into JDK-8357473-compile-task-free-list - Also free the lock! - Comments and indenting - Basic deletion ------------- Changes: https://git.openjdk.org/jdk/pull/25409/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25409&range=02 Stats: 105 lines in 4 files changed: 13 ins; 57 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/25409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25409/head:pull/25409 PR: https://git.openjdk.org/jdk/pull/25409 From shade at openjdk.org Mon Jun 2 15:00:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 15:00:06 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 17:34:36 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion Re-merged with current master. Running tests now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25409#issuecomment-2931134142 From dfenacci at openjdk.org Mon Jun 2 15:13:29 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 2 Jun 2025 15:13:29 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 Message-ID: The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). Running them **sequentially** should be OK and should avoid running out of memory. Testing: Tier1-3+ ------------- Commit messages: - JDK-8358129: remove compiler/startup/StartupOutput.java from ProblemList - Merge branch 'master' into JDK-8358129 - JDK-8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 Changes: https://git.openjdk.org/jdk/pull/25582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25582&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358129 Stats: 11 lines in 2 files changed: 2 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25582/head:pull/25582 PR: https://git.openjdk.org/jdk/pull/25582 From shade at openjdk.org Mon Jun 2 15:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 15:39:54 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Any further comments / testing? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24972#issuecomment-2931292775 From yzheng at openjdk.org Mon Jun 2 15:58:25 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 2 Jun 2025 15:58:25 GMT Subject: RFR: 8358333: Use VEX2 prefix in Assembler::psllq Message-ID: While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 c4 e1 c1 73 f7 34 By setting the rex_w to WIG, the emitted bytes are: c5 c1 73 f7 34 ------------- Commit messages: - Use VEX2 prefix in Assembler::psllq Changes: https://git.openjdk.org/jdk/pull/25593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25593&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358333 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25593/head:pull/25593 PR: https://git.openjdk.org/jdk/pull/25593 From yzheng at openjdk.org Mon Jun 2 16:04:57 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 2 Jun 2025 16:04:57 GMT Subject: RFR: 8358333: Use VEX2 prefix in Assembler::psllq In-Reply-To: References: Message-ID: <6_9L_DiGyVYdZqzwGTLMKyTAUURjRwpwHvQYEgBMZVo=.3e7ac003-9721-43e0-b0cf-4ed89d67d431@github.com> On Mon, 2 Jun 2025 15:53:17 GMT, Yudi Zheng wrote: > While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 > https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 > > > c4 e1 c1 73 f7 34 > > > By setting the rex_w to WIG, the emitted bytes are: > > > c5 c1 73 f7 34 @jatin-bhateja could you please review this trivial PR? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25593#issuecomment-2931397163 From galder at openjdk.org Mon Jun 2 16:40:51 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 2 Jun 2025 16:40:51 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: <-9AT4ja1WZHf_xLO6Uzl90zPJKG-KOHTyUG075CTxHE=.be43593a-ff3b-4e33-8a63-f1d02cce8836@github.com> On Mon, 2 Jun 2025 10:57:22 GMT, Damon Fenacci wrote: > The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. > > There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). > > Running them **sequentially** should be OK and should avoid running out of memory. > > Testing: Tier1-3+ What impact has this change in the time the test takes to run? If it turns out to be too slow, maybe the processes could be run batches? ------------- PR Review: https://git.openjdk.org/jdk/pull/25582#pullrequestreview-2889190186 From dnsimon at openjdk.org Mon Jun 2 16:42:35 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 16:42:35 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror [v2] In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: still rely on get_nmethod_mirror in invalidate_nmethod_mirror ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25488/files - new: https://git.openjdk.org/jdk/pull/25488/files/0dcb1c78..7456988a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25488&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25488&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25488/head:pull/25488 PR: https://git.openjdk.org/jdk/pull/25488 From never at openjdk.org Mon Jun 2 16:42:35 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 2 Jun 2025 16:42:35 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror [v2] In-Reply-To: References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Mon, 2 Jun 2025 16:39:42 GMT, Doug Simon wrote: >> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. >> This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > still rely on get_nmethod_mirror in invalidate_nmethod_mirror Marked as reviewed by never (Reviewer). src/hotspot/share/jvmci/jvmciRuntime.cpp line 807: > 805: oop nmethod_mirror = get_nmethod_mirror(nm, /* phantom_ref */ false); > 806: if (nmethod_mirror == nullptr) { > 807: return; minor typo ------------- PR Review: https://git.openjdk.org/jdk/pull/25488#pullrequestreview-2889188965 PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2121662248 From dnsimon at openjdk.org Mon Jun 2 16:42:36 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 16:42:36 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror [v2] In-Reply-To: References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: <_hhDZ4c8Kic5tPWMwNHv_G21vpCK2eKjTFJQOqW_nEw=.1c90159f-ee23-441c-b72f-283bd2b35c80@github.com> On Fri, 30 May 2025 16:05:23 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> still rely on get_nmethod_mirror in invalidate_nmethod_mirror > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 801: > >> 799: >> 800: void JVMCINMethodData::invalidate_nmethod_mirror(nmethod* nm) { >> 801: if (_nmethod_mirror_index == -1) { > > This part is actually wrong as that's the first part of `get_nmethod_mirror` and we must always check that `get_nmethod_mirror` doesn't return nullptr. I'd assumed that the mirror was always non-null if `_nmethod_mirror_index != -1` but that's not true. The slot is reserved for all non-default nmethods and must stay around so that `translate` can work. Fixed: https://github.com/openjdk/jdk/pull/25488/commits/7456988a6fcab00bf13e602553f9e5a295d75b0f ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2121658848 From dnsimon at openjdk.org Mon Jun 2 16:48:16 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 16:48:16 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror [v3] In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25488/files - new: https://git.openjdk.org/jdk/pull/25488/files/7456988a..e02be82c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25488&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25488&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25488/head:pull/25488 PR: https://git.openjdk.org/jdk/pull/25488 From eosterlund at openjdk.org Mon Jun 2 16:59:57 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 16:59:57 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror [v3] In-Reply-To: References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Mon, 2 Jun 2025 16:48:16 GMT, Doug Simon wrote: >> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. >> This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25488#pullrequestreview-2889256264 From eastigeevich at openjdk.org Mon Jun 2 17:04:06 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 2 Jun 2025 17:04:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Fri, 30 May 2025 19:25:51 GMT, Tom Rodriguez wrote: > So this copying keeps the same compile_id, which sort of makes sense but it's also potentially confusing. What's the plan for how this interacts with flags like PrintNMethods and JVMTI code installation notification? This is done in nmethod::post_compiled_method which doesn't seem to be used on the new nmethod. If the reclamation of the old nmethod is performed in the normal fashion, we now have 2 nmethods alive with the same compile_id which could be confusing. But allocating a new compile_id breaks the connection to the original compile which seems bad too. As we are not compiling, `compile_id` should stay the same. Yes, we need to add some logging: `log_info(codecache)` and `PrintNMethods`. According to https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodLoad, compile methods can be moved. We need to generate events if it happens: >If it is moved, the [CompiledMethodUnload](https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodUnload) event is sent, followed by a new CompiledMethodLoad event. > we now have 2 nmethods alive with the same compile_id which could be confusing. If `compile_id` is interpreted as id of nmethod, it is confusing. Comment to `nmethod::_compile_id`: https://github.com/openjdk/jdk/blob/aea2837143289800cfbb7044de4f105e87e233ff/src/hotspot/share/code/nmethod.hpp#L259 According to it, it is id of a compilation task. In such case there should be no confusion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2931612988 From duke at openjdk.org Mon Jun 2 17:41:00 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 17:41:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: <62uTtu5i-RDdM1Lnk0i_2JXoNdbJzcn4CBXdCGBU3B0=.48748b12-0871-46b3-9754-b42943fdbad5@github.com> References: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> <0tmWzYMOS7jyjgoJL0mBMRywf6mCEBkSTQ7jdRE7Xtg=.5857550c-35e0-4cbb-8bd8-0542ae1b70a5@github.com> <62uTtu5i-RDdM1Lnk0i_2JXoNdbJzcn4CBXdCGBU3B0=.48748b12-0871-46b3-9754-b42943fdbad5@github.com> Message-ID: <5iiKPx9fgyc5pvJIzUggaL1XEPeijmRPqsMIC1MGa48=.9f380d7c-9504-4921-8daf-1543fc5b1cba@github.com> On Thu, 29 May 2025 11:24:46 GMT, Evgeny Astigeevich wrote: >> If we want to guarantee that a trampoline exists if `Assembler::reachable_from_branch_at` fails we would need to update Graal to use the check as well > >> The null trampoline check is needed because on debug builds branches of distance >2M will fall into the if (!Assembler::reachable_from_branch_at(addr(), x)) block but Graal would not have generated a trampoline for that call because it is still <128M. It is still safe to use that distance but it is just different than what HotSpot expects > > This logic looks strange to me. > You are saying that a trampoline is only null in case of Graal but dest is always valid in this case. > This is a bug in Graal: it always uses 128M branch range despite Hotspot can change the range to smaller values in debug builds. > When Graal fixes the bug you will have undefined behaviour in this place. > We must handle the situation where no trampoline is available. > Options: > 1. This is a bug in code generation. If the bug can be easy to reproduce with debug builds, use assert. If no, use guarantee. > 2. This is an expected case. We need to generate a trampoline. This can be complicated. > > I think it's a bug situation. Updated the code to have a guarantee check here instead. Filed the following which will fix Graal to use the same range check https://github.com/oracle/graal/issues/11291 https://bugs.openjdk.org/browse/JDK-8358096 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2121784842 From duke at openjdk.org Mon Jun 2 17:48:00 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 17:48:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 20:57:40 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Change to ImmutableDataReferences > > test/lib/jdk/test/whitebox/WhiteBox.java line 498: > >> 496: relocateNMethodFromMethod0(method, type); >> 497: } >> 498: public native void relocateNMethodFromAddr0(long address, int type); > > Why does the name have '0' at the end? I noticed that was a common trend with JNI functions. There is no `relocateNMethodFromAddr` so I suppose the '0' isn't necessary in this case ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2121796964 From jbhateja at openjdk.org Mon Jun 2 17:58:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Jun 2025 17:58:09 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fix aarch64 failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/4065fb9c..96ecbac1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From asmehra at openjdk.org Mon Jun 2 18:37:02 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 2 Jun 2025 18:37:02 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor Message-ID: This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. ------------- Commit messages: - 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor Changes: https://git.openjdk.org/jdk/pull/25598/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25598&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358330 Stats: 17 lines in 3 files changed: 6 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25598/head:pull/25598 PR: https://git.openjdk.org/jdk/pull/25598 From jbhateja at openjdk.org Mon Jun 2 18:45:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Jun 2025 18:45:31 GMT Subject: RFR: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input Message-ID: Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Test mentioned in the bug report has been included allong with the patch. Kindly review. Best Regards, Jatin ------------- Commit messages: - 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input Changes: https://git.openjdk.org/jdk/pull/25586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351645 Stats: 72 lines in 2 files changed: 70 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25586/head:pull/25586 PR: https://git.openjdk.org/jdk/pull/25586 From chagedorn at openjdk.org Mon Jun 2 18:54:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 2 Jun 2025 18:54:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 14:10:29 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 88: >> >>> 86: if (w <= 0) { >>> 87: return null; >>> 88: } >> >> Shouldn't the weight always be positive? > > Yes. True. I sometimes just also cover negative values to be a bit more robust... but I can also change it if you prefer that. I guess if it's never negative, you can still cover it but maybe throw an exception instead? >> test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 266: >> >>> 264: case StringToken(String s) -> { >>> 265: renderStringWithDollarAndHashtagReplacements(s); >>> 266: } >> >> Suggestion: >> >> case StringToken(String s) -> renderStringWithDollarAndHashtagReplacements(s); > > I think I prefer the uniformity of the brackets as I have it. Would that be ok for you too? I don't have a strong preference, so I'm fine with it ? >> test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java line 43: >> >>> 41: * Creates a new {@link TemplateBinding} that has no Template bound to it yet. >>> 42: */ >>> 43: public TemplateBinding() {} >> >> Can also be removed since it's the default constructor that is automatically added for you. >> Suggestion: > > If I do that, then `javadoc` complains: > > test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java:37: warning: use of default constructor, which does not provide a comment > public class TemplateBinding { Ah I see. Then you can leave it in. >> test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 74: >> >>> 72: case Float s -> outputList.add(new StringToken(Renderer.format(s))); >>> 73: case Boolean s -> outputList.add(new StringToken(Renderer.format(s))); >>> 74: case List l -> parseList(l, outputList); >> >> Not sure if we should use a raw `List` here. Would `List` work as well? Would then need to update `parseList(List inputList ...)` to `List` as well. > > What exactly do you think is the problem here? My IDE advises against matching on the raw type `List`. As an alternative you can match on `List`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121921123 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121924453 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121918498 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2121917521 From duke at openjdk.org Mon Jun 2 19:12:19 2025 From: duke at openjdk.org (Tom Shull) Date: Mon, 2 Jun 2025 19:12:19 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v3] In-Reply-To: References: Message-ID: <23ZmqystynFesmzLG1GXybdGxQSChcwZQt4cM__DbYw=.ae8a7c39-7fc2-4e27-82a3-6f4e879debd9@github.com> > This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolve()` > 2. `JavaConstant lookup()` > > The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. Tom Shull has updated the pull request incrementally with two additional commits since the last revision: - commit to trigger testing - commit to trigger testing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25420/files - new: https://git.openjdk.org/jdk/pull/25420/files/60c39b5e..4d508fc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25420/head:pull/25420 PR: https://git.openjdk.org/jdk/pull/25420 From asmehra at openjdk.org Mon Jun 2 19:20:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 2 Jun 2025 19:20:51 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:32:12 GMT, Ashutosh Mehra wrote: > This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. See description of [JDK-8358330](https://bugs.openjdk.org/browse/JDK-8358330) for more details. @vnkozlov can you please review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25598#issuecomment-2932112060 From duke at openjdk.org Mon Jun 2 20:36:37 2025 From: duke at openjdk.org (Tom Shull) Date: Mon, 2 Jun 2025 20:36:37 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v3] In-Reply-To: References: Message-ID: <7qRH8PFpSXJTshHNvxEMqEbc34N5wSnpknQaMUbWrCg=.6de4f71a-8c22-492f-b156-b25a07f3b428@github.com> > Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. > > To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. Tom Shull has updated the pull request incrementally with one additional commit since the last revision: return List.of() from getAllMethods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25498/files - new: https://git.openjdk.org/jdk/pull/25498/files/0de1feae..ae81d46f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25498/head:pull/25498 PR: https://git.openjdk.org/jdk/pull/25498 From duke at openjdk.org Mon Jun 2 21:08:50 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 21:08:50 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v22] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with six additional commits since the last revision: - Update ImmutableDataReferences - Update nm valid check - Remove 0 from relocateNMethodFromAddr0 - Update DeoptimizeRelocatedNMethod to call relocated function - Fix is_safe - Use ptrdiff_t instead of int ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/9f753071..eed3d434 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=20-21 Stats: 19 lines in 9 files changed: 4 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Mon Jun 2 21:47:48 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 21:47:48 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v23] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Update immutable_data_references naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/eed3d434..b0dad665 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=21-22 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Mon Jun 2 21:52:02 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 21:52:02 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> <3lvuXGbqkCDeGwkzDQtzhkbZGN1XgcTcuFfL0_TUPvA=.4ba152ea-b3ab-4339-a42c-03d78bfcc829@github.com> Message-ID: On Fri, 30 May 2025 23:21:41 GMT, Vladimir Kozlov wrote: >> I have created a RFE to move the immutable data from the nmethod to a separate class. [JDK-8358213](https://bugs.openjdk.org/browse/JDK-8358213) > > Thanks. Let's keep current changes as it is with small comment: > IMMUTABLE_DATA_REFERENCES is used with `sizeof()` in all places - consider using instead > ``` > #define IMMUTABLE_DATA_REFERENCES_SIZE sizeof(int) > ``` I updated it to hold the size based on @vnkozlov suggestion ([source](https://github.com/chadrako/jdk/blob/b0dad6659047553ebee1387939e54ea817b31cb1/src/hotspot/share/code/nmethod.hpp#L172)) @dean-long Are you good with this change and moving the immutable data to a separate class in a seperate issue ([JDK-8358213](https://bugs.openjdk.org/browse/JDK-8358213))? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2122223999 From duke at openjdk.org Mon Jun 2 22:43:50 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 22:43:50 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix test copywrite ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/b0dad665..4e80e359 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=22-23 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Mon Jun 2 22:44:54 2025 From: duke at openjdk.org (duke) Date: Mon, 2 Jun 2025 22:44:54 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates [v2] In-Reply-To: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> References: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> Message-ID: <0y-3unyZVrC4JTeHDkrRKfQFloudfzacKa99fG689aQ=.ad39a021-2f6f-460b-913a-3868633d35af@github.com> On Thu, 29 May 2025 23:04:25 GMT, Chad Rakoczy wrote: >> [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) >> >> The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) >> >> The reasoning for this change is the same as the x86 version's PR: >> >>> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >>> >>> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. >> >> Additional testing: >> >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Address comments @chadrako Your change (at version 0a33652392d445fa0f10650edc5448168f823272) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25512#issuecomment-2932759213 From cslucas at openjdk.org Mon Jun 2 22:49:36 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 2 Jun 2025 22:49:36 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v3] In-Reply-To: References: Message-ID: > Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. > > Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25402/files - new: https://git.openjdk.org/jdk/pull/25402/files/f6c64755..2aabfa72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=01-02 Stats: 7 lines in 1 file changed: 4 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25402/head:pull/25402 PR: https://git.openjdk.org/jdk/pull/25402 From duke at openjdk.org Mon Jun 2 22:59:59 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 2 Jun 2025 22:59:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 22:43:50 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix test copywrite Thanks for pointing out the missing JVMTI event publication. I?m currently looking into what?s required to address that, along with JFR event publication that may also have been missed. I?d appreciate hearing others? thoughts on how critical this is: should we treat it as a blocker for integration, or would it be acceptable to follow up with a separate issue? We?re hoping to get this into JDK 25, as it would simplify both development and backporting of features related to hot code grouping. That said, if the consensus is that JVMTI/JFR support is essential upfront, this can be delayed until JDK 26. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2932801568 From dlong at openjdk.org Mon Jun 2 23:53:59 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 2 Jun 2025 23:53:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> <3lvuXGbqkCDeGwkzDQtzhkbZGN1XgcTcuFfL0_TUPvA=.4ba152ea-b3ab-4339-a42c-03d78bfcc829@github.com> Message-ID: On Mon, 2 Jun 2025 21:49:36 GMT, Chad Rakoczy wrote: >> Thanks. Let's keep current changes as it is with small comment: >> IMMUTABLE_DATA_REFERENCES is used with `sizeof()` in all places - consider using instead >> ``` >> #define IMMUTABLE_DATA_REFERENCES_SIZE sizeof(int) >> ``` > > I updated it to hold the size based on @vnkozlov suggestion ([source](https://github.com/chadrako/jdk/blob/b0dad6659047553ebee1387939e54ea817b31cb1/src/hotspot/share/code/nmethod.hpp#L172)) > > @dean-long Are you good with this change and moving the immutable data to a separate class in a seperate issue ([JDK-8358213](https://bugs.openjdk.org/browse/JDK-8358213))? Sure, fine with me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2122349147 From kvn at openjdk.org Tue Jun 3 00:35:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 00:35:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 22:57:19 GMT, Chad Rakoczy wrote: > We?re hoping to get this into JDK 25, as it would simplify both development and backporting of features related to hot code grouping. That said, if the consensus is that JVMTI/JFR support is essential upfront, this can be delayed until JDK 26. I don't think this can be put into JDK 25. Too late and changes are not simple. And yes, JVMTI/JFR support is essential - you have to support all public functionalities of VM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2932980633 From kvn at openjdk.org Tue Jun 3 01:31:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 01:31:56 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:32:12 GMT, Ashutosh Mehra wrote: > This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. See description of [JDK-8358330](https://bugs.openjdk.org/browse/JDK-8358330) for more details. I am not comfortable that you are changing code not used by AOT. Can you consider populate `CodeBlob::_asm_remarks` and `_dbg_strings` after calling `CodeBlob::create()`? Then you don't need temporary `AsmRemarks` and `DbgStrings`. ------------- PR Review: https://git.openjdk.org/jdk/pull/25598#pullrequestreview-2890312467 From xgong at openjdk.org Tue Jun 3 01:49:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 01:49:07 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:15:22 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> >>> Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> >>> https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> >>> I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> >> >>> > Yes, I also observed such regression. >>> > It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. >>> >>> For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. >> >> Sounds good to me. Thanks! > >> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >> > >> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! > > Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! > @XiaohongGong I reviewed #25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? That's fine to me. Thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2933082670 From xgong at openjdk.org Tue Jun 3 01:49:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 01:49:07 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 01:45:57 GMT, Xiaohong Gong wrote: >>> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> > >>> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >>> >>> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! > >> @XiaohongGong I reviewed #25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? > > That's fine to me. Thanks for your review! > Hi @XiaohongGong , Looks good to me, thanks again for this re-factor !! > > Best Regards, Jatin Thanks so much for your review @jatin-bhateja ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2933083694 From xgong at openjdk.org Tue Jun 3 01:51:54 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 01:51:54 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Mon, 2 Jun 2025 10:47:05 GMT, Emanuel Peter wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > test/micro/org/openjdk/bench/vm/compiler/CountedLoopCastIV.java line 54: > >> 52: Random r = new Random(); >> 53: start = r.nextInt(LEN >> 2); >> 54: limit = r.nextInt(LEN >> 1, LEN - 3); > > Does this not mean that we use a different seed every time, and therefore the loop has different lengths, and so the results can be influenced accordingly? Yes, I just want to make sure the loop length is different with each time running, so that it will not be influenced by some profiling related optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2122445568 From xgong at openjdk.org Tue Jun 3 02:11:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 02:11:51 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Mon, 2 Jun 2025 10:45:58 GMT, Emanuel Peter wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopCastIV.java line 174: > >> 172: >> 173: public static void main(String[] args) { >> 174: TestFramework.runWithFlags("-XX:LoopUnrollLimit=0"); > > What is the reason for the flag here? Do you really need it? Thanks so much for your review! This flag prevents the loop being unrolled and splited (i.e. pre-main-post loop mode) as well. So that the compiler can get just one `CountedLoop` in the case, and we can make sure it is generated by the loop exactly. Checking the count of `CountedLoop` >0 is also fine to me without this flag. I just want to avoid any noise. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2122468953 From jbhateja at openjdk.org Tue Jun 3 02:41:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 02:41:28 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: Message-ID: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/b79d8e35..9c249239 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=00-01 Stats: 54 lines in 5 files changed: 54 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From jbhateja at openjdk.org Tue Jun 3 02:47:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 02:47:57 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 02:41:28 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Thanks, encoding logic is concentrated in integral instruction tests and is shared with corresponding long variants, extended APX coverage for BLS/R/MSK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2933177986 From mhaessig at openjdk.org Tue Jun 3 03:21:59 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 3 Jun 2025 03:21:59 GMT Subject: Integrated: 8354930: IGV: dump C2 graph before and after live range stretching In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:54:24 GMT, Manuel H?ssig wrote: > This PR introduces a new phase `LIVE_RANGE_STRETCHING` that prints after live ranges have been stretched, if that happens at all. The phase `INITIAL_LIVENESS` is moved before live range stretching so we can compare the live ranges before and after stretching in IGV, which is useful for debugging why an oop suddenly belongs to an oop map. > > ## Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15299362485) > - [x] tier1 and tier1, plus additional Oracle internal testing for all Oracle supported platforms and OSs > - [x] verified that the new phase prints when it should in IGV and with `-XX:PrintPhaseLevel=4` This pull request has now been integrated. Changeset: 24edd3b2 Author: Manuel H?ssig Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/24edd3b2c1324fd58575a6273e5cae17e3d6fbf5 Stats: 7 lines in 3 files changed: 5 ins; 2 del; 0 mod 8354930: IGV: dump C2 graph before and after live range stretching Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25492 From galder at openjdk.org Tue Jun 3 04:16:52 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 3 Jun 2025 04:16:52 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Thanks for the fix and expanding the test case. ------------- Marked as reviewed by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/24972#pullrequestreview-2890550967 From azeller at openjdk.org Tue Jun 3 04:44:53 2025 From: azeller at openjdk.org (Arno Zeller) Date: Tue, 3 Jun 2025 04:44:53 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: <-9AT4ja1WZHf_xLO6Uzl90zPJKG-KOHTyUG075CTxHE=.be43593a-ff3b-4e33-8a63-f1d02cce8836@github.com> References: <-9AT4ja1WZHf_xLO6Uzl90zPJKG-KOHTyUG075CTxHE=.be43593a-ff3b-4e33-8a63-f1d02cce8836@github.com> Message-ID: On Mon, 2 Jun 2025 16:38:36 GMT, Galder Zamarre?o wrote: > What impact has this change in the time the test takes to run? If it turns out to be too slow, maybe the processes could be run batches? I checked on one of our windows Machines - the test did run in 11 seconds before and took 48 seconds after this change. Looks fine for me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25582#issuecomment-2933397555 From thartmann at openjdk.org Tue Jun 3 05:38:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Jun 2025 05:38:51 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: <-WFYyJVFxG0nhBwCJuRcpMZhBUtba6Nf1dHVrhNfFxU=.3b277750-d2ee-495f-8959-4837fcc0354b@github.com> On Mon, 2 Jun 2025 10:57:22 GMT, Damon Fenacci wrote: > The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. > > There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). > > Running them **sequentially** should be OK and should avoid running out of memory. > > Testing: Tier1-3+ Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25582#pullrequestreview-2890727368 From xgong at openjdk.org Tue Jun 3 05:39:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 05:39:53 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> On Mon, 2 Jun 2025 10:51:21 GMT, Emanuel Peter wrote: > @XiaohongGong I suggest you change the title from: `8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times` to `8357726: C2 recognize loops with multiple casts in trip counter` or even: `8357726: C2 recognize loops with multiple casts in trip counter: phi -> CastII* -> AddI -> phi` Thanks for your suggestion! Sounds better to me. How about changing the title to `Improve C2 to recognize counted loops with multiple casts in trip counter` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2933530238 From epeter at openjdk.org Tue Jun 3 05:55:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 05:55:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:48:56 GMT, Christian Hagedorn wrote: >> Yes. True. I sometimes just also cover negative values to be a bit more robust... but I can also change it if you prefer that. > > I guess if it's never negative, you can still cover it but maybe throw an exception instead? Ok, I'll add a check/exception :) >> What exactly do you think is the problem here? > > My IDE advises against matching on the raw type `List`. As an alternative you can match on `List`. Done, I must have been tired yesterday afternoon ? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122782343 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122776155 From epeter at openjdk.org Tue Jun 3 05:55:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 05:55:16 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: <7jtDESfNoZ3zdEsSrVDsbtDk3nF2p96DlgdIBqfVAXI=.75feb7a9-31da-4f78-b420-f3abb9b2356c@github.com> On Mon, 2 Jun 2025 12:24:32 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/library/Hooks.java line 32: > >> 30: */ >> 31: public abstract class Hooks { >> 32: private Hooks() {} // Avoid instantiation and need for documentation. > > With `abstract` you cannot call the constructor. But you could make `Hooks` final instead of abstract and keep the private constructor. Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122777154 From epeter at openjdk.org Tue Jun 3 06:04:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 06:04:19 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:13:24 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 181: > >> 179: */ >> 180: public DataName sample() { >> 181: DataName n = (DataName)Renderer.getCurrent().sampleName(predicate()); > > Do you really need this cast? Can't you just return a `Name`. From the uses it seems that you only call interface methods from `Name` at the use-sites. That would require `Name` to become public. I wanted to avoid that. I want the user to think that `DataName` and `StructuralName` are separate but parallel. But internally, they have a unified implementation with `Name`. An alternative would have been to use `NameSet` generically, once with `DataName` and `StructuralName`. But that would mean we could have a `DataName` with the same `name` as a `StructuralName`, because they would be in separate `NameSet`s. What do you think? > test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 47: > >> 45: */ >> 46: public StructuralName { >> 47: } > > Is this required? Is it not automatically added? Same for `DataName`. If I remove it, then `javadoc` complains that we are now using the default constructor, and that it does not have a comment for it... kinda strange but ok ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122789086 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122796063 From epeter at openjdk.org Tue Jun 3 06:04:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 06:04:19 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 06:00:38 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/StructuralName.java line 47: >> >>> 45: */ >>> 46: public StructuralName { >>> 47: } >> >> Is this required? Is it not automatically added? Same for `DataName`. > > If I remove it, then `javadoc` complains that we are now using the default constructor, and that it does not have a comment for it... kinda strange but ok ? So I'd rather keep it. It's a bit of unnecessary boilarplate, but I like having `javadoc` all happy and quiet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122798094 From epeter at openjdk.org Tue Jun 3 06:11:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 06:11:10 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:31:57 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 92: > >> 90: /** >> 91: * Creates a special frame, which has a {@link #parent} but uses the {@link NameSet} >> 92: * from the parent frame, allowing {@link Template#defineName} to persist in the outer > > `defineName` -> `addName`? Good catch! Left over from a previous refactoring. `javadoc` does not complain about it, because it seems it does not look at anything that is not `public` ? > I have not checked if this is fully possible but it just occurred to me when reviewing this duplicated interface now. It would require `Name` to be public. As I said above, I'd like to avoid that. We could of course detach `Name.Type` and make it its own interface `NameType`, and make that public. (Just calling it `Type` is a bit too generic, and may lead to name collisions later on). Having `Name.Type` private and `DataName.Type` and `StructuralName.Type` public means they are separate, and the user cannot use one for the other. Hence, the user cannot confuse them as easily. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122809426 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122807687 From dnsimon at openjdk.org Tue Jun 3 06:22:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 3 Jun 2025 06:22:57 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror [v3] In-Reply-To: References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: <4HicVmT-d5SBUtqg8Q2JqQSDhCXmzZdGMcc-CrCu8Bw=.7483a779-a992-4153-9a55-55dd20c7f029@github.com> On Mon, 2 Jun 2025 16:48:16 GMT, Doug Simon wrote: >> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. >> This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25488#issuecomment-2933646688 From dnsimon at openjdk.org Tue Jun 3 06:22:57 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 3 Jun 2025 06:22:57 GMT Subject: Integrated: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. This pull request has now been integrated. Changeset: 6cfd4057 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/6cfd4057dce9262f54e71a3930e16da84aa0d9f1 Stats: 12 lines in 3 files changed: 0 ins; 4 del; 8 mod 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror Reviewed-by: eosterlund, never ------------- PR: https://git.openjdk.org/jdk/pull/25488 From dnsimon at openjdk.org Tue Jun 3 06:32:54 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 3 Jun 2025 06:32:54 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v3] In-Reply-To: <7qRH8PFpSXJTshHNvxEMqEbc34N5wSnpknQaMUbWrCg=.6de4f71a-8c22-492f-b156-b25a07f3b428@github.com> References: <7qRH8PFpSXJTshHNvxEMqEbc34N5wSnpknQaMUbWrCg=.6de4f71a-8c22-492f-b156-b25a07f3b428@github.com> Message-ID: On Mon, 2 Jun 2025 20:36:37 GMT, Tom Shull wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > Tom Shull has updated the pull request incrementally with one additional commit since the last revision: > > return List.of() from getAllMethods Still looks good to me. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25498#pullrequestreview-2890872094 From epeter at openjdk.org Tue Jun 3 06:37:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 06:37:53 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v74] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/30059e66..310d7d86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=73 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=72-73 Stats: 93 lines in 5 files changed: 65 ins; 20 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue Jun 3 06:37:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 06:37:56 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:54:10 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 91 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - validation tests >> - dollar and hashtag parsing validatiaon >> - wip refactor parsing dollar and hashtag >> - more fixes from Christian >> - more improvements >> - more suggestions applied >> - good practice >> - rename template arguments >> - more from Christian >> - ... and 81 more: https://git.openjdk.org/jdk/compare/90d6ad01...cb7037e7 > > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 338: > >> 336: } >> 337: >> 338: private void renderStringWithDollarAndHashtagReplacements(String s) { > > Hard to grasp the logic of that method. But I trust you on that :-) I leave it up to you if you want to improve readability to extract some of the logic to separate methods such that this method becomes easier to understand. I split out a part and added some more comments / examples. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2122847550 From epeter at openjdk.org Tue Jun 3 06:41:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 06:41:13 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 12:14:48 GMT, Christian Hagedorn wrote: >> Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! >> >> I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) > >> @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. >> >> These are issues we could continue the conversation, unless you are satisfied with my answers: [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737) [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391) >> >> This is now ready for another review pass ? > > Awesome, thanks for spending some more time with these nasty edge-cases and finding a solution! I had a look at your updates for all my comments, they look good, thanks! > > I'm going to make a pass over the implementation classes now and will have a look at the `Renderer` updates as well :-) @chhagedorn Thank you very much for the thorough review! I addressed all your comments. We might still want to have a conversation about the `Name` and `Name.Type`. I like the way I have it because it separates the `DataName` and `StructuralName` in the API (less user confusion), while having a unified implementation. But it does mean some casting and some API duplication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2933725922 From thartmann at openjdk.org Tue Jun 3 06:42:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Jun 2025 06:42:52 GMT Subject: RFR: 8351635: C2 ROR/ROL: assert failed: Long constant expected In-Reply-To: References: Message-ID: On Wed, 28 May 2025 14:19:21 GMT, Jatin Bhateja wrote: > This bug fix patch relaxes the strict assertion check to allow other pattern matches for degenerated long vector ROL/ROR operations with non-constant scalar shift values. > > Kindly review and share feedback. > > Best Regards, > Jatin Looks good to me. I submitted testing and will report back once it passed. Could you please update the affects version accordingly? I assume this is a regression from [JDK-8271589](https://bugs.openjdk.org/browse/JDK-8271589)? test/hotspot/jtreg/compiler/vectorapi/TestVectorRotateScalarCount.java line 114: > 112: } > 113: > 114: Suggestion: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25493#pullrequestreview-2890886147 PR Review Comment: https://git.openjdk.org/jdk/pull/25493#discussion_r2122854451 From yzheng at openjdk.org Tue Jun 3 06:53:53 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Jun 2025 06:53:53 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v3] In-Reply-To: <7qRH8PFpSXJTshHNvxEMqEbc34N5wSnpknQaMUbWrCg=.6de4f71a-8c22-492f-b156-b25a07f3b428@github.com> References: <7qRH8PFpSXJTshHNvxEMqEbc34N5wSnpknQaMUbWrCg=.6de4f71a-8c22-492f-b156-b25a07f3b428@github.com> Message-ID: On Mon, 2 Jun 2025 20:36:37 GMT, Tom Shull wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > Tom Shull has updated the pull request incrementally with one additional commit since the last revision: > > return List.of() from getAllMethods Marked as reviewed by yzheng (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25498#pullrequestreview-2890926150 From thartmann at openjdk.org Tue Jun 3 07:00:58 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Jun 2025 07:00:58 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 02:41:28 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions src/hotspot/cpu/x86/x86_64.ad line 10620: > 10618: instruct xorI_rReg_imm(rRegI dst, immI src, rFlagsReg cr) > 10619: %{ > 10620: predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); Suggestion: predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2122865710 From epeter at openjdk.org Tue Jun 3 07:12:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 07:12:53 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 14:10:54 GMT, Hamlin Li wrote: > I mean it's really "unconditionally", but if you feel it's better to add an argument, like supports_vectorize_cmove_bool_unconditionally(BasicType src, BasicType dst), I can do it. I think this would be good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2122925805 From epeter at openjdk.org Tue Jun 3 07:19:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 07:19:54 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: On Tue, 3 Jun 2025 05:36:47 GMT, Xiaohong Gong wrote: > Thanks for your suggestion! Sounds better to me. How about changing the title to Improve C2 to recognize counted loops with multiple casts in trip counter ? @XiaohongGong Sounds good too :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2933851019 From dskantz at openjdk.org Tue Jun 3 07:22:25 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 3 Jun 2025 07:22:25 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations Message-ID: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. Testing: T1-4. Extra testing: ran the tests manually with `-XX:+OptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. ------------- Commit messages: - fix up tests Changes: https://git.openjdk.org/jdk/pull/25610/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25610&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357822 Stats: 98 lines in 7 files changed: 92 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25610.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25610/head:pull/25610 PR: https://git.openjdk.org/jdk/pull/25610 From epeter at openjdk.org Tue Jun 3 07:24:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 07:24:52 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: <6JheHO7RO7O4aEUlkwYvMAWra7NFZgdmRz4wBSnzA9c=.f6bfb95d-89a2-437a-9068-ed54099807d2@github.com> On Tue, 3 Jun 2025 01:48:47 GMT, Xiaohong Gong wrote: >> test/micro/org/openjdk/bench/vm/compiler/CountedLoopCastIV.java line 54: >> >>> 52: Random r = new Random(); >>> 53: start = r.nextInt(LEN >> 2); >>> 54: limit = r.nextInt(LEN >> 1, LEN - 3); >> >> Does this not mean that we use a different seed every time, and therefore the loop has different lengths, and so the results can be influenced accordingly? > > Yes, I just want to make sure the loop length is different with each time running, so that it will not be influenced by some profiling related optimizations. I see. Did you have any direct issues with profiling here? I'm worried that the trip count is really very variant here. We could have: start = 0 and limit = LEN - 3 -> count ~ LEN start = LEN/4 and limit = LEN/2 -> count = LEN/4 That is a 4x variance, am I right? Can you run the benchmark a few times, and see what the error term looks like? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2122954605 From thartmann at openjdk.org Tue Jun 3 07:25:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Jun 2025 07:25:56 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 07:06:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - fix comment > - fix comment Just a reminder that since this is a P4, the fix would need to be integrated until RDP 2 on Thursday (June 5) this week (or we need to raise the priority). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25284#issuecomment-2933871198 From epeter at openjdk.org Tue Jun 3 07:29:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 07:29:52 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: <4JntkYt8QE4lSwWuvEVfqyp_EriMyVV-2YRZKwj6uZk=.e27f92e4-d8f0-49a1-b491-fe19b22141c3@github.com> On Tue, 3 Jun 2025 02:09:10 GMT, Xiaohong Gong wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopCastIV.java line 174: >> >>> 172: >>> 173: public static void main(String[] args) { >>> 174: TestFramework.runWithFlags("-XX:LoopUnrollLimit=0"); >> >> What is the reason for the flag here? Do you really need it? > > Thanks so much for your review! This flag prevents the loop being unrolled and splited (i.e. pre-main-post loop mode) as well. So that the compiler can get just one `CountedLoop` in the case, and we can make sure it is generated by the loop exactly. Checking the count of `CountedLoop` >0 is also fine to me without this flag. I just want to avoid any noise. WDYT? I would suggest this: Have one run with `-XX:LoopUnrollLimit=0`, and one run without setting the flag. Write some comments about why you are setting the flag. You could then restrict your IR rule to `LoopUnrollLimit=0`. This is where you can most easily reproduce the multiple CastII problem, and it is the easiest to write an IR rule and explain. You should probably leave a comment as to why you set that flag. But maybe you also succeed in writing an IR rule for `LoopUnrollLimit > 0`, though it could be a little more noisy/complicated. It would just be nice to see that things work fine without having to set special flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2122963453 From mchevalier at openjdk.org Tue Jun 3 08:08:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 3 Jun 2025 08:08:57 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 10:37:11 GMT, Marc Chevalier wrote: >> ### Problem >> >> On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: >> >> ; Load lFld into local x >> ldr x11, [x10, #120] >> ; popCountI >> mov w11, w11 >> mov v16.d[0], x11 >> cnt v16.8b, v16.8b >> addv b16, v16.8b >> mov x13, v16.d[0] >> ; [...] >> ; store local x (which is believed to still contain lFld) into result >> str x11, [x10, #128] >> >> >> The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: >> >> instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ >> match(Set dst (PopCountI src)); >> effect(TEMP tmp); >> [...] >> %} >> >> >> But then, why resetting the upper word of `x11`? It all starts with vector instructions: >> >> cnt v16.8b, v16.8b >> addv b16, v16.8b >> >> The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing >> >> mov v16.s[0], w11 >> >> would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which >> >> mov w11, w11 >> mov v16.d[0], x11 >> >> does, but by destroying `x11`. >> >> ### Solution >> >> Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. >> >> The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: >> >> mov v16.s[1], wzr ... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions Thanks @sendaoYan, @theRealAph and @TobiHartmann for review and nice suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2934041591 From mchevalier at openjdk.org Tue Jun 3 08:08:59 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 3 Jun 2025 08:08:59 GMT Subject: Integrated: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier wrote: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... This pull request has now been integrated. Changeset: be923a8b Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/be923a8b7229cb7a705e72ebbb3046e9f2085048 Stats: 84 lines in 2 files changed: 80 ins; 2 del; 2 mod 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 Reviewed-by: aph, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25551 From jbhateja at openjdk.org Tue Jun 3 08:19:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 08:19:06 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v3] In-Reply-To: References: Message-ID: <1jna58ZtxrGgcqNt9FQf5Tl-rIo6YwTFYzavusVZGyA=.87513e10-77ba-436e-9d9e-b82f5041d368@github.com> > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/x86/x86_64.ad Thanks :-) Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/9c249239..b5f69c8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From jbhateja at openjdk.org Tue Jun 3 08:35:20 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 08:35:20 GMT Subject: RFR: 8351635: C2 ROR/ROL: assert failed: Long constant expected [v2] In-Reply-To: References: Message-ID: <5k2J6AUT-a3B006J_ksxccQVxprZa21uqUbKTGkkby0=.5dfc4f2b-ad7a-4393-bf5e-efc246582c83@github.com> > This bug fix patch relaxes the strict assertion check to allow other pattern matches for degenerated long vector ROL/ROR operations with non-constant scalar shift values. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/vectorapi/TestVectorRotateScalarCount.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25493/files - new: https://git.openjdk.org/jdk/pull/25493/files/c68b7468..9ec47164 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25493&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25493&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25493.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25493/head:pull/25493 PR: https://git.openjdk.org/jdk/pull/25493 From shade at openjdk.org Tue Jun 3 08:36:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 08:36:56 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates [v2] In-Reply-To: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> References: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> Message-ID: On Thu, 29 May 2025 23:04:25 GMT, Chad Rakoczy wrote: >> [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) >> >> The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) >> >> The reasoning for this change is the same as the x86 version's PR: >> >>> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >>> >>> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. >> >> Additional testing: >> >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Address comments I would like @theRealAph to ack this before I sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25512#issuecomment-2934132868 From wenanjian at openjdk.org Tue Jun 3 08:46:02 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 3 Jun 2025 08:46:02 GMT Subject: RFR: 8358105: RISC-V: Optimize interpreter profile updates Message-ID: The reason of this patch is same as the x86 and aarch64 but for riscv [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. ------------- Commit messages: - delete useless func declare and add assert back - RISCV: Optimize interpreter profile updates Changes: https://git.openjdk.org/jdk/pull/25520/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25520&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358105 Stats: 33 lines in 2 files changed: 0 ins; 21 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25520.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25520/head:pull/25520 PR: https://git.openjdk.org/jdk/pull/25520 From aph at openjdk.org Tue Jun 3 08:48:52 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Jun 2025 08:48:52 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates [v2] In-Reply-To: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> References: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> Message-ID: <_uKcfKe3J417r7ute1faRzhFqXC5xYVDvRscI8z4e5k=.d82eec06-2dc8-4961-9c40-a5f70bb3be11@github.com> On Thu, 29 May 2025 23:04:25 GMT, Chad Rakoczy wrote: >> [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) >> >> The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) >> >> The reasoning for this change is the same as the x86 version's PR: >> >>> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >>> >>> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. >> >> Additional testing: >> >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Yes, that's a nice improvement. Most satisfying. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25512#pullrequestreview-2891342554 From shade at openjdk.org Tue Jun 3 08:58:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 08:58:05 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates [v2] In-Reply-To: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> References: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> Message-ID: On Thu, 29 May 2025 23:04:25 GMT, Chad Rakoczy wrote: >> [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) >> >> The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) >> >> The reasoning for this change is the same as the x86 version's PR: >> >>> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >>> >>> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. >> >> Additional testing: >> >> - [x] Linux aarch64 fastdebug tier 1/2/3/4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Excellent, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25512#issuecomment-2934211424 From duke at openjdk.org Tue Jun 3 08:58:06 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 3 Jun 2025 08:58:06 GMT Subject: Integrated: 8357223: AArch64: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 28 May 2025 20:21:20 GMT, Chad Rakoczy wrote: > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > > The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > > The reasoning for this change is the same as the x86 version's PR: > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >> >> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > Additional testing: > > - [x] Linux aarch64 fastdebug tier 1/2/3/4 This pull request has now been integrated. Changeset: 44025276 Author: Chad Rakoczy Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/4402527683ed08eebf4953a9d83f72f64a5ff4fa Stats: 47 lines in 2 files changed: 0 ins; 37 del; 10 mod 8357223: AArch64: Optimize interpreter profile updates Reviewed-by: shade, aph ------------- PR: https://git.openjdk.org/jdk/pull/25512 From epeter at openjdk.org Tue Jun 3 09:28:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 09:28:03 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:20:57 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64 failure > > src/hotspot/share/opto/intrinsicnode.cpp line 288: > >> 286: // For constant mask strictly less than zero, maximum result value will be >> 287: // same as mask value with its sign bit flipped, assuming all but last read >> 288: // source bits are set to 1. > > Suggestion: > > // For constant mask strictly less than zero, the maximum result value will be > // the same as the mask value with its sign bit flipped, assuming all source bits but the last > // are set to 1. Honestly, I don't understand the sign flip... hmm > src/hotspot/share/opto/intrinsicnode.cpp line 298: > >> 296: // result.hi = 0xEFFFFFFF ^ 0x80000000 = 0x6FFFFFFF >> 297: // result.lo = 0x80000000 >> 298: // > > Same here: why not do a proper `if-else`, and add the comments to each scope directly? `Result.Hi` -> `result.hi` etc for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123251808 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123233772 From epeter at openjdk.org Tue Jun 3 09:28:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 09:28:03 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 17:58:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 failure Thanks for all the comment updates! I had a few minute to look into it, and will add more later! src/hotspot/share/opto/intrinsicnode.cpp line 267: > 265: // mask = 0xEFFFFFFF (constant mask) > 266: // result.hi = 0x7FFFFFFF > 267: // result.lo = 0 Should shit not go inside the `CompressBits` scope? `Hi` -> `lo` `Result.Hi = popcount(1 << mask_bits - 1)` Does not look right. Is this not the wrong way around? Just repeating code here also does not make sense. Either give a reason in english, or just drop the duplication if it is indeed trivail. I would also do the case distinction a bit clearer: If mask == -1 -> all ones -> just returns src: result.lo = type_min (happens if src = type_min) Question: does that not mean we could just return the input type of `src`? If mask != -1 -> at least one zero in mask -> result cannot be negative: result.lo = 0 But if we are doing this with the comments, then why not just create an `if-else` block, and add the comments inside each block? src/hotspot/share/opto/intrinsicnode.cpp line 272: > 270: int bitcount = population_count(static_cast(bt == T_INT ? maskcon & 0xFFFFFFFFL : maskcon)); > 271: hi = maskcon == -1L ? hi : (1UL << bitcount) - 1; > 272: lo = maskcon == -1L ? lo : 0L; It could be nice to have a proper `if-else` here, and add the comments to each scope, rather than above. That would allow you to avoid duplicating the code above in the comments. src/hotspot/share/opto/intrinsicnode.cpp line 274: > 272: lo = maskcon == -1L ? lo : 0L; > 273: } else { > 274: // Case A.2 bit expansion:- I would put the assert for `Op_ExpandBits` above, so that it is immediately clear that this matches. src/hotspot/share/opto/intrinsicnode.cpp line 278: > 276: // Result.Hi = mask, optimistically assuming all source bits > 277: // read starting from least significant bit positions are 1. > 278: // Result.Lo = 0 Suggestion: // Result.Lo = 0, because at least one bit in mask is zero. src/hotspot/share/opto/intrinsicnode.cpp line 288: > 286: // For constant mask strictly less than zero, maximum result value will be > 287: // same as mask value with its sign bit flipped, assuming all but last read > 288: // source bits are set to 1. Suggestion: // For constant mask strictly less than zero, the maximum result value will be // the same as the mask value with its sign bit flipped, assuming all source bits but the last // are set to 1. src/hotspot/share/opto/intrinsicnode.cpp line 298: > 296: // result.hi = 0xEFFFFFFF ^ 0x80000000 = 0x6FFFFFFF > 297: // result.lo = 0x80000000 > 298: // Same here: why not do a proper `if-else`, and add the comments to each scope directly? src/hotspot/share/opto/intrinsicnode.cpp line 300: > 298: // > 299: assert(opc == Op_ExpandBits, ""); > 300: hi = maskcon >= 0L ? maskcon : maskcon ^ lo; If you are already touching this line: `maskcon ^ lo` is really a bit hairy. It should really be `maskcon ^ type_min(bt)`, and then you add a comment right there that it is a sign flip. ------------- PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2891441258 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123225738 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123228042 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123229716 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123235763 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123240806 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123231362 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123248474 From dfenacci at openjdk.org Tue Jun 3 09:29:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 3 Jun 2025 09:29:53 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: <-9AT4ja1WZHf_xLO6Uzl90zPJKG-KOHTyUG075CTxHE=.be43593a-ff3b-4e33-8a63-f1d02cce8836@github.com> Message-ID: On Tue, 3 Jun 2025 04:42:23 GMT, Arno Zeller wrote: > > What impact has this change in the time the test takes to run? If it turns out to be too slow, maybe the processes could be run batches? > > I checked on one of our windows Machines - the test did run in 11 seconds before and took 48 seconds after this change. I quickly checked on our machines (few different architectures) and it went from a range between 2 and 10 seconds to a range between 6 and 23 seconds. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25582#issuecomment-2934342130 From fjiang at openjdk.org Tue Jun 3 09:29:59 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 3 Jun 2025 09:29:59 GMT Subject: RFR: 8358105: RISC-V: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Thu, 29 May 2025 11:05:04 GMT, Anjian Wen wrote: > The reason of this patch is same as the x86 and aarch64 but for riscv > [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25520#pullrequestreview-2891497710 From epeter at openjdk.org Tue Jun 3 09:31:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 09:31:55 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: References: Message-ID: <6tkCVQHc4bQQsHHt-VfuZi00vTUuyWwYT3gZGyFAAMA=.3cc07024-2d7f-4eaa-ac62-811532e50a75@github.com> On Tue, 3 Jun 2025 09:25:01 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/intrinsicnode.cpp line 288: >> >>> 286: // For constant mask strictly less than zero, maximum result value will be >>> 287: // same as mask value with its sign bit flipped, assuming all but last read >>> 288: // source bits are set to 1. >> >> Suggestion: >> >> // For constant mask strictly less than zero, the maximum result value will be >> // the same as the mask value with its sign bit flipped, assuming all source bits but the last >> // are set to 1. > > Honestly, I don't understand the sign flip... hmm Ah, you are just masking off the sign bit... right. Makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123265342 From shade at openjdk.org Tue Jun 3 09:38:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 09:38:09 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: > Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. > > Also tidied up some comments around it. > > Additional testing; > - [x] Linux x86_64 server fastdebug, `tier1 tier2` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8357434-x86-profile-taken - Stale comment - Merge branch 'master' into JDK-8357434-x86-profile-taken - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25343/files - new: https://git.openjdk.org/jdk/pull/25343/files/816b7af7..f17feaa6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25343&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25343&range=01-02 Stats: 49352 lines in 792 files changed: 25581 ins; 15003 del; 8768 mod Patch: https://git.openjdk.org/jdk/pull/25343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25343/head:pull/25343 PR: https://git.openjdk.org/jdk/pull/25343 From syan at openjdk.org Tue Jun 3 10:02:02 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 3 Jun 2025 10:02:02 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 10:57:22 GMT, Damon Fenacci wrote: > The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. > > There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). > > Running them **sequentially** should be OK and should avoid running out of memory. > > Testing: Tier1-3+ test/hotspot/jtreg/compiler/startup/StartupOutput.java line 67: > 65: int reservedCodeCacheSizeInKb = initialCodeCacheSizeInKb + rand.nextInt(200); > 66: pb = ProcessTools.createLimitedTestJavaProcessBuilder("-XX:InitialCodeCacheSize=" + initialCodeCacheSizeInKb + "K", "-XX:ReservedCodeCacheSize=" + reservedCodeCacheSizeInKb + "k", "-version"); > 67: out = new OutputAnalyzer(pb.start()); SInce we start VMs from concurrently to serially, do we need start VMs 200 times anymore, maybe 20 times or 5 times is enough, or even just 1 time is enough? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25582#discussion_r2123333185 From shade at openjdk.org Tue Jun 3 10:38:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 10:38:54 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v3] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 22:49:36 GMT, Cesar Soares Lucas wrote: >> Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. >> >> Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback. Looks OK to me. This is a diagnostic logging, so we do not have to be extra crisp about it. Let any other compiler folks review as well. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25402#pullrequestreview-2891737056 From kwei at openjdk.org Tue Jun 3 10:50:39 2025 From: kwei at openjdk.org (Kuai Wei) Date: Tue, 3 Jun 2025 10:50:39 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v17] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... Kuai Wei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Move _merge_memops_checks into OrI/OrL - Fix test error after merging - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Fix for comments - Fix build error on mac and windows - Add check flag for combine operator - Make MergeLoadInfoList an in-place growable array - Fix for comments - Merge remote-tracking branch 'origin/master' into dev/merge_loads - ... and 14 more: https://git.openjdk.org/jdk/compare/8674f491...bdaae3ee ------------- Changes: https://git.openjdk.org/jdk/pull/24023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=16 Stats: 2727 lines in 17 files changed: 2677 ins; 0 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From jbhateja at openjdk.org Tue Jun 3 10:52:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 10:52:50 GMT Subject: RFR: 8358333: Use VEX2 prefix in Assembler::psllq In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:53:17 GMT, Yudi Zheng wrote: > While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 > https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 > > > c4 e1 c1 73 f7 34 > > > By setting the rex_w to WIG, the emitted bytes are: > > > c5 c1 73 f7 34 LGTM. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25593#pullrequestreview-2891782918 From mli at openjdk.org Tue Jun 3 11:26:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 3 Jun 2025 11:26:52 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 07:10:19 GMT, Emanuel Peter wrote: >> ( In this pr, it should return false for riscv too and be enabled in the riscv pr. I'll modify it. ) >> >>> Does RISCV support the use of any input vector element type, including 8bit, 16bit, 32bit and 64bit masks, and any elements we would be blending, incl byte, short, char, int, long, HF, F, D? >> >> Good question! I'll add some additional tests to double check and reflect this. >> >> I think the answer should be yes, i.e. on riscv all size of source inputs (comparing operands) and all size of dest outputs (blending result) are supported. >> But for HF, it's a bit special, the underlying payload is a short, so in theory it should be supported too, but it's not supported in this pr and the related riscv pr (https://github.com/openjdk/jdk/pull/25341). >> >>> Because it sounds you are promissing this really "unconditionally". Or what exactly do you mean by "unconditionally"? >> >> I mean it's really "unconditionally", but if you feel it's better to add an argument, like `supports_vectorize_cmove_bool_unconditionally(BasicType src, BasicType dst)`, I can do it. >> And I need to modify the `vectornode.cpp` as below too, I'll check it and modify this pr. >> ``` case Op_CMoveI: >> return (is_integral_type(bt) && bt != T_LONG ? Op_VectorBlend : 0); > >> I mean it's really "unconditionally", but if you feel it's better to add an argument, like supports_vectorize_cmove_bool_unconditionally(BasicType src, BasicType dst), I can do it. > > I think this would be good! There is some issue when the comparison is unsigned one, e.g. `c[i] = Long.compareUnsigned(a[i], b[i]) > 0 ? 1.0 : 2.0;`, or `c[i] = (a[i] > b[i]) ? 1.0 : 2.0;` when a[]/b[] are char[]. Seems currently the unsigned comparison is not supported for superword vectorization? The unsigned information is lost, i.e. all the comparisons are just signed ones. I checked the geneated code, and seems when VectorMaskCmp is matched, `BoolTest::unsigned_compare & cond` is always 0 in the passed in `cond` parameter. (Vector API supports unsigned ones, as it passes in `cond` with `BoolTest::unsigned_compare` mask explicitly when the operator is in UGE/UGT/ULE/ULT.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123518238 From epeter at openjdk.org Tue Jun 3 11:31:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 11:31:25 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v75] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: remove unnecessary Type.name() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/310d7d86..455cd434 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=74 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=73-74 Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue Jun 3 11:43:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 11:43:48 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <0ML8Y2hLfrZZ_NiRcltx_D4MoFqjwCJguJAxCIvJpHU=.1adc4a92-4f24-4037-8d2f-062b590d23b1@github.com> On Mon, 2 Jun 2025 12:14:48 GMT, Christian Hagedorn wrote: >> Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! >> >> I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) > >> @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. >> >> These are issues we could continue the conversation, unless you are satisfied with my answers: [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737) [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391) >> >> This is now ready for another review pass ? > > Awesome, thanks for spending some more time with these nasty edge-cases and finding a solution! I had a look at your updates for all my comments, they look good, thanks! > > I'm going to make a pass over the implementation classes now and will have a look at the `Renderer` updates as well :-) @chhagedorn Thanks a lot for taking the time for the offline meeting! I now updated the two little things we talked about. Looking forward to another round of review :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2934840238 From epeter at openjdk.org Tue Jun 3 11:43:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 11:43:48 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v76] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rename View -> FilteredSet ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/455cd434..fa3d086a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=75 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=74-75 Stats: 51 lines in 3 files changed: 5 ins; 0 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From thartmann at openjdk.org Tue Jun 3 11:44:58 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Jun 2025 11:44:58 GMT Subject: RFR: 8351635: C2 ROR/ROL: assert failed: Long constant expected [v2] In-Reply-To: <5k2J6AUT-a3B006J_ksxccQVxprZa21uqUbKTGkkby0=.5dfc4f2b-ad7a-4393-bf5e-efc246582c83@github.com> References: <5k2J6AUT-a3B006J_ksxccQVxprZa21uqUbKTGkkby0=.5dfc4f2b-ad7a-4393-bf5e-efc246582c83@github.com> Message-ID: On Tue, 3 Jun 2025 08:35:20 GMT, Jatin Bhateja wrote: >> This bug fix patch relaxes the strict assertion check to allow other pattern matches for degenerated long vector ROL/ROR operations with non-constant scalar shift values. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/vectorapi/TestVectorRotateScalarCount.java > > Co-authored-by: Tobias Hartmann Looks good to me. All tests passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25493#pullrequestreview-2891957732 From roland at openjdk.org Tue Jun 3 11:46:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 11:46:55 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 15:24:38 GMT, Christian Hagedorn wrote: > I'm not sure either, we would not to further investigate if we can find cases that benefit from it. Should we file an RFE either way? I filed: https://bugs.openjdk.org/browse/JDK-8358501 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25396#issuecomment-2934848953 From roland at openjdk.org Tue Jun 3 12:03:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 12:03:59 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:50:38 GMT, Emanuel Peter wrote: > What I would like to see for **testing**: add some more patterns with IR rules. More that now optimize, and also a few that do not optimize, just so we have a bit of a sense what we are still missing. > > @rwestrel Filed this issue. I wonder: what do you think we should do here? How general should the optimization/canonicalization be? Having a clearer view what optimizes and what doesn't as you suggest (and filing a bug to keep track of what's missing) sounds useful. Beyond that, I don't see why we wouldn't get what we have so far integrated. It can be improved or reworked down the road but it feels useful to me as it is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2934913822 From aph at openjdk.org Tue Jun 3 12:09:04 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Jun 2025 12:09:04 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 22:43:50 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix test copywrite src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 89: > 87: x = trampoline; > 88: } > 89: call->set_destination(x); I think I see what you're doing here, but it doesn't look right. At the very least it's a trap for maintainers, who don't expect the destination address to be discarded if the call doesn't reach. When the call doesn't reach, I believe you're fixing up an internal call to point to its target in the new copy of the code. But this isn't right when calls are PC relative, is it? In that case it makes more sense to leave the call instruction alone rather than rewrite it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2123618495 From roland at openjdk.org Tue Jun 3 12:10:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 12:10:54 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Tue, 27 May 2025 08:11:27 GMT, Galder Zamarre?o wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > test/hotspot/jtreg/compiler/c2/TestMismatchedAddPAfterMaxUnroll.java line 74: > >> 72: } >> 73: if (flag) { >> 74: if (flag2) { > > It looks a bit odd to have this if statement and the flag one. One would expect these to be dead code eliminated? Or does the DCE only happen after the problematic C2 crash? Might be useful to have some comment explaining the rationale for these if statements. What happens is that if a branch is never taken at runtime (as is the case here for `if (flag2) {` here), that's captured by profile data. If c2 sees a never taken branch, it doesn't parse it. So it is compiled to: if (flags2) { uncommon_trap(unstable_if); } As a result, c2 never sees the branch is empty and the while statement is useless. It's a missed optimization opportunity but I find it useful for test cases and use that pattern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2123621503 From dfenacci at openjdk.org Tue Jun 3 12:49:54 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 3 Jun 2025 12:49:54 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: <1LnBHbsT5dzl9cE4ooMiSSRV0vGu4Y0qZiZ2rkKjdTY=.c9759e84-cf54-44eb-9922-0b9691822010@github.com> On Tue, 3 Jun 2025 09:57:37 GMT, SendaoYan wrote: >> The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. >> >> There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). >> >> Running them **sequentially** should be OK and should avoid running out of memory. >> >> Testing: Tier1-3+ > > test/hotspot/jtreg/compiler/startup/StartupOutput.java line 67: > >> 65: int reservedCodeCacheSizeInKb = initialCodeCacheSizeInKb + rand.nextInt(200); >> 66: pb = ProcessTools.createLimitedTestJavaProcessBuilder("-XX:InitialCodeCacheSize=" + initialCodeCacheSizeInKb + "K", "-XX:ReservedCodeCacheSize=" + reservedCodeCacheSizeInKb + "k", "-version"); >> 67: out = new OutputAnalyzer(pb.start()); > > SInce we start VMs from concurrently to serially, do we need start VMs 200 times anymore, maybe 20 times or 5 times is enough, or even just 1 time is enough? The goal is actually to test the VM startup with different (randomised) code cache sizes. So, I'm not sure that there is another way to do that other than starting a new VM every time. We could definitely try with a lower number but, as the whole test takes just a few seconds to run, I don't think it would make a big difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25582#discussion_r2123709317 From thartmann at openjdk.org Tue Jun 3 13:03:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Jun 2025 13:03:54 GMT Subject: RFR: 8358333: Use VEX2 prefix in Assembler::psllq In-Reply-To: References: Message-ID: <6GfLx7sBoN-0ffjlsnL0dZ-q-GKAje-1HNO-FQ__b4o=.7ace0d2f-85c0-42dc-b4a4-3f6749287da2@github.com> On Mon, 2 Jun 2025 15:53:17 GMT, Yudi Zheng wrote: > While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 > https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 > > > c4 e1 c1 73 f7 34 > > > By setting the rex_w to WIG, the emitted bytes are: > > > c5 c1 73 f7 34 Looks good to me too (assuming you ran this through Oracle testing). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25593#pullrequestreview-2892254436 From epeter at openjdk.org Tue Jun 3 13:05:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 13:05:51 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 11:23:47 GMT, Hamlin Li wrote: > Seems currently the unsigned comparison is not supported for superword vectorization? I think that currently only `float` and `doulbe` for CMove is really implemented. Integer types are still to be added, see [JDK-8308841](https://bugs.openjdk.org/browse/JDK-8308841) C2 SuperWord: implement vectorization of integer CMove I hope we get to it soon, and then we can generally extend the combinations too. Like comparing `int`, but blending between `double`. Maybe it would be better if for now you focus just on the `D/F` cases that are already supported on x86 and aarch64? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123751783 From epeter at openjdk.org Tue Jun 3 13:07:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 13:07:53 GMT Subject: RFR: 8358333: Use VEX2 prefix in Assembler::psllq In-Reply-To: <6_9L_DiGyVYdZqzwGTLMKyTAUURjRwpwHvQYEgBMZVo=.3e7ac003-9721-43e0-b0cf-4ed89d67d431@github.com> References: <6_9L_DiGyVYdZqzwGTLMKyTAUURjRwpwHvQYEgBMZVo=.3e7ac003-9721-43e0-b0cf-4ed89d67d431@github.com> Message-ID: <-d9PNtM_vl1zXmkLMmxLjkp0MexZqaNjlXNmmyG7uTA=.0e5e57a4-101b-4628-86f5-332beb095e88@github.com> On Mon, 2 Jun 2025 16:02:30 GMT, Yudi Zheng wrote: >> While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 >> https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 >> >> >> c4 e1 c1 73 f7 34 >> >> >> By setting the rex_w to WIG, the emitted bytes are: >> >> >> c5 c1 73 f7 34 > > @jatin-bhateja could you please review this trivial PR? Thanks! @mur47x111 Testing was only run for tier1-3. It would be better if there was additional stress testing as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25593#issuecomment-2935143384 From chagedorn at openjdk.org Tue Jun 3 13:19:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Jun 2025 13:19:52 GMT Subject: RFR: 8351635: C2 ROR/ROL: assert failed: Long constant expected [v2] In-Reply-To: <5k2J6AUT-a3B006J_ksxccQVxprZa21uqUbKTGkkby0=.5dfc4f2b-ad7a-4393-bf5e-efc246582c83@github.com> References: <5k2J6AUT-a3B006J_ksxccQVxprZa21uqUbKTGkkby0=.5dfc4f2b-ad7a-4393-bf5e-efc246582c83@github.com> Message-ID: On Tue, 3 Jun 2025 08:35:20 GMT, Jatin Bhateja wrote: >> This bug fix patch relaxes the strict assertion check to allow other pattern matches for degenerated long vector ROL/ROR operations with non-constant scalar shift values. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/vectorapi/TestVectorRotateScalarCount.java > > Co-authored-by: Tobias Hartmann Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25493#pullrequestreview-2892322888 From epeter at openjdk.org Tue Jun 3 13:35:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 13:35:08 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: References: Message-ID: <2YFuLETRIRASPPjocbdhIGklH-45xnIVuY6cYrAdIzU=.84c661ff-faf8-49e8-9c05-056bb9a0fcab@github.com> On Mon, 2 Jun 2025 17:58:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 failure A few more comments about the first part. Will now dig into the case where `!mask_type->is_con()` next... src/hotspot/share/opto/intrinsicnode.cpp line 241: > 239: jlong lo = bt == T_INT ? min_jint : min_jlong; > 240: > 241: if(mask_type->is_con() && mask_type->get_con_as_long(bt) != -1L) { Now you removed the condition `mask_type->get_con_as_long(bt) != -1L`. Do you know why it was there in the first place? It seems to me that if `mask_type->get_con_as_long(bt) == -1L`, then we can just return the type of `src`, right? src/hotspot/share/opto/intrinsicnode.cpp line 292: > 290: // To compute minimum result value we assume all but last read source bit as zero, > 291: // this is because sign bit of result will always be set to 1 while other bit > 292: // corresponding to set mask bit should be zero. I don't understand, are you talking about `lo` if `mask < 0`? Don't we just keep `lo = type_min`, which is always ok? ------------- PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2892367256 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123819264 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123813921 From epeter at openjdk.org Tue Jun 3 13:35:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 13:35:10 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: <2YFuLETRIRASPPjocbdhIGklH-45xnIVuY6cYrAdIzU=.84c661ff-faf8-49e8-9c05-056bb9a0fcab@github.com> References: <2YFuLETRIRASPPjocbdhIGklH-45xnIVuY6cYrAdIzU=.84c661ff-faf8-49e8-9c05-056bb9a0fcab@github.com> Message-ID: <-qsTG7NyclV8PbQ1CsbHobu0bCwIK-6JvsMhmzmpVtg=.51d21506-9da9-4bf2-93d8-6907a6b54c5b@github.com> On Tue, 3 Jun 2025 13:28:36 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64 failure > > src/hotspot/share/opto/intrinsicnode.cpp line 241: > >> 239: jlong lo = bt == T_INT ? min_jint : min_jlong; >> 240: >> 241: if(mask_type->is_con() && mask_type->get_con_as_long(bt) != -1L) { > > Now you removed the condition `mask_type->get_con_as_long(bt) != -1L`. Do you know why it was there in the first place? > > It seems to me that if `mask_type->get_con_as_long(bt) == -1L`, then we can just return the type of `src`, right? This is a bug-fix for `CompressBitsNode::Value`, but this change also has an effect on `ExpandBitsNode::Value`, and that makes me a little nervous. For example: do we have enough test coverage for `expand`? It seems we did not have enough tests for `compress`, so probably also not for `expand`... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123824471 From mli at openjdk.org Tue Jun 3 13:48:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 3 Jun 2025 13:48:57 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 13:03:39 GMT, Emanuel Peter wrote: >> There is some issue when the comparison is unsigned one, e.g. `c[i] = Long.compareUnsigned(a[i], b[i]) > 0 ? 1.0 : 2.0;`, or `c[i] = (a[i] > b[i]) ? 1.0 : 2.0;` when a[]/b[] are char[]. >> >> Seems currently the unsigned comparison is not supported for superword vectorization? The unsigned information is lost, i.e. all the comparisons are just signed ones. >> I checked the geneated code, and seems when VectorMaskCmp is matched, `BoolTest::unsigned_compare & cond` is always 0 in the passed in `cond` parameter. >> (Vector API supports unsigned ones, as it passes in `cond` with `BoolTest::unsigned_compare` mask explicitly when the operator is in UGE/UGT/ULE/ULT.) > >> Seems currently the unsigned comparison is not supported for superword vectorization? > > I think that currently only `float` and `doulbe` for CMove is really implemented. Integer types are still to be added, see [JDK-8308841](https://bugs.openjdk.org/browse/JDK-8308841) > C2 SuperWord: implement vectorization of integer CMove > I hope we get to it soon, and then we can generally extend the combinations too. Like comparing `int`, but blending between `double`. > > Maybe it would be better if for now you focus just on the `D/F` cases that are already supported on x86 and aarch64? Thanks for the information! I'll hold off these prs until integer CMove vectorization is fully supported. At first, I also just planned to implement the CMoveF/D on riscv and let it automatically vectorized based on current C2 implementation. But, I found some performance regression in the cases of some type combination (please check the `table 1` below), the reason is that for some type combination cmoveF/D can not be vectorized, because of the type size check in `SuperWord::is_velt_basic_type_compatible_use_def`, on the other hand scalar implementation of CMoveF/D on riscv explode the generated code after loop unroll (because of the complicated implmentation on riscv). These 2 reasons will lead to the performance regression in some cases. table 1 Can be vectorized? | CMoveF | CMoveD -- | -- | -- CmpI | V | X CmpU | V | X CmpL | X | V CmpUL | X | V CmpF | V | X CmpD | X | V CmpN | V | X CmpP | X | V ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123879603 From epeter at openjdk.org Tue Jun 3 13:52:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 13:52:56 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: <2eJGxfqvohND-ilZR_F1g-Bu4IBfKOHR3myHQVPNFcU=.ce4110a0-4946-4a8a-9eda-d8e54bc69bb6@github.com> On Tue, 3 Jun 2025 13:46:24 GMT, Hamlin Li wrote: >>> Seems currently the unsigned comparison is not supported for superword vectorization? >> >> I think that currently only `float` and `doulbe` for CMove is really implemented. Integer types are still to be added, see [JDK-8308841](https://bugs.openjdk.org/browse/JDK-8308841) >> C2 SuperWord: implement vectorization of integer CMove >> I hope we get to it soon, and then we can generally extend the combinations too. Like comparing `int`, but blending between `double`. >> >> Maybe it would be better if for now you focus just on the `D/F` cases that are already supported on x86 and aarch64? > > Thanks for the information! > I'll hold off these prs until integer CMove vectorization is fully supported. > > At first, I also just planned to implement the CMoveF/D on riscv and let it automatically vectorized based on current C2 implementation. > But, I found some performance regression in the cases of some type combination (please check the `table 1` below), the reason is that for some type combination cmoveF/D can not be vectorized, because of the type size check in `SuperWord::is_velt_basic_type_compatible_use_def`, on the other hand scalar implementation of CMoveF/D on riscv explode the generated code after loop unroll (because of the complicated implmentation on riscv). These 2 reasons will lead to the performance regression in some cases. > > table 1 > > Can be vectorized? | CMoveF | CMoveD > -- | -- | -- > CmpI | V | X > CmpU | V | X > CmpL | X | V > CmpUL | X | V > CmpF | V | X > CmpD | X | V > CmpN | V | X > CmpP | X | V > > Yes, getting this all right and with optimal performance is tricky... @jaskarth is working on https://github.com/openjdk/jdk/pull/23413, which will make changes to `SuperWord::is_velt_basic_type_compatible_use_def` ... so we also will have to see how this plays together... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123889184 From mli at openjdk.org Tue Jun 3 13:58:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 3 Jun 2025 13:58:52 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: <2eJGxfqvohND-ilZR_F1g-Bu4IBfKOHR3myHQVPNFcU=.ce4110a0-4946-4a8a-9eda-d8e54bc69bb6@github.com> References: <2eJGxfqvohND-ilZR_F1g-Bu4IBfKOHR3myHQVPNFcU=.ce4110a0-4946-4a8a-9eda-d8e54bc69bb6@github.com> Message-ID: On Tue, 3 Jun 2025 13:50:07 GMT, Emanuel Peter wrote: >> Thanks for the information! >> I'll hold off these prs until integer CMove vectorization is fully supported. >> >> At first, I also just planned to implement the CMoveF/D on riscv and let it automatically vectorized based on current C2 implementation. >> But, I found some performance regression in the cases of some type combination (please check the `table 1` below), the reason is that for some type combination cmoveF/D can not be vectorized, because of the type size check in `SuperWord::is_velt_basic_type_compatible_use_def`, on the other hand scalar implementation of CMoveF/D on riscv explode the generated code after loop unroll (because of the complicated implmentation on riscv). These 2 reasons will lead to the performance regression in some cases. >> >> table 1 >> >> Can be vectorized? | CMoveF | CMoveD >> -- | -- | -- >> CmpI | V | X >> CmpU | V | X >> CmpL | X | V >> CmpUL | X | V >> CmpF | V | X >> CmpD | X | V >> CmpN | V | X >> CmpP | X | V >> >> > > Yes, getting this all right and with optimal performance is tricky... @jaskarth is working on https://github.com/openjdk/jdk/pull/23413, which will make changes to `SuperWord::is_velt_basic_type_compatible_use_def` ... so we also will have to see how this plays together... Yes, it is. Thank you for discussion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2123908890 From asmehra at openjdk.org Tue Jun 3 14:13:52 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 3 Jun 2025 14:13:52 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 01:29:04 GMT, Vladimir Kozlov wrote: > Can you consider populate CodeBlob::_asm_remarks and _dbg_strings after calling CodeBlob::create()? Then you don't need temporary AsmRemarks and DbgStrings. That would work as well. CodeBlob::_asm_remarks::_remarks need to be allocated memory explicitly in this case. I will add AsmRemarks::init() to allocate memory for and initialize AsmRemarks::_remarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25598#issuecomment-2935464214 From shade at openjdk.org Tue Jun 3 14:15:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 14:15:01 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:38:09 GMT, Aleksey Shipilev wrote: >> Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. >> >> Also tidied up some comments around it. >> >> Additional testing; >> - [x] Linux x86_64 server fastdebug, `tier1 tier2` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Stale comment > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Fix Ran Linux x86_64 server fastdebug, `make test TEST=all`, no new failures. I think this is ready for integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25343#issuecomment-2935466193 From epeter at openjdk.org Tue Jun 3 14:21:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 14:21:02 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v9] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 17:58:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 failure src/hotspot/share/opto/intrinsicnode.cpp line 314: > 312: // mask value for which iff all corresponding input bits are set then bit compression > 313: // will result in a -ve value, therefore this case negates the possibility of > 314: // strictly non-negative bit compression result. Grammar was a little off. I think we can say it in fewer words. Suggestion: // Case B.1 The mask value range includes -1, hence we may use all bits, // the result has the whole value range. src/hotspot/share/opto/intrinsicnode.cpp line 328: > 326: // optimistic upper bound of result i.e. all the bits other than leading zero bits > 327: // can be assumed holding 1 value. > 328: assert(mask_type->lo_as_long() >= 0, ""); Suggestion: assert(mask_type->lo_as_long() >= 0, ""); // Case B.3 Mask value range only includes non-negative values. Since all integral // types honours an invariant that TypeInteger._lo <= TypeInteger._hi, thus computing // leading zero bits of upper bound of mask value will allow us to ascertain // optimistic upper bound of result i.e. all the bits other than leading zero bits // can be assumed holding 1 value. Have the assert first, like a condition. Then the comments follow from it and make sense immediately. src/hotspot/share/opto/intrinsicnode.cpp line 339: > 337: // compression result will never be a -ve value and we can safely set the > 338: // lower bound of bit compression to zero. > 339: lo = result_bit_width == mask_bit_width ? lo : 0L; Fixed grammar a little. Suggestion: // If the number of bits required to for the mask value range is less than the // full bit width of the integral type, then the MSB bit is guaranteed to be zero, // thus the compression result will never be a -ve value and we can safely set the // lower bound of the bit compression to zero. lo = result_bit_width == mask_bit_width ? lo : 0L; src/hotspot/share/opto/intrinsicnode.cpp line 369: > 367: hi = src_type->hi_as_long() >= 0 ? src_type->hi_as_long() : hi; > 368: // Tightening upper bound of bit compression as per Rule 3. > 369: hi = result_bit_width < mask_bit_width ? MIN2((jlong)((1UL << result_bit_width) - 1L), hi) : hi; // As per Rule 1, bit compression packs the source bits corresponding to // set mask bits This says something, but does not really explain the rest of the sentence: // set mask bits, hence for a non-negative input, result of compression will // always be less that equal to input. Plus: input could be both `mask` or `src`. I would be specific, and just talk about `src`. Also: I don't really see how the conclusion in this sentence follows from its assumption. How exactly does some bits not participating really ensure that the value is not greater? // set. If a mask bit corresponding to set input bit is zero then that input bit will // not take part in bit compression, which means that maximum possible result value // can never be greater than non-negative input. I think I know what you are trying to say, it just sounds a little vague. It also smells like this `Lemma 1` might be easier proved by a proof of contradiction. I need to take a break now, but I'll see if I can come up with something a bit clearer later. Here my suggestion: Suggestion: if (src_type->hi_as_long() >= 0) { // Lemma 1: For strictly non-negative src, the result of the compression will never be // greater than src. // Proof: Since src is a non-negative value, its most significant bit is always 0. // Thus even if the corresponding MSB of the mask is one, the result will be a +ve // value. hi = src_type->hi_as_long(); } if (result_bit_width < mask_bit_width) { // Rule 3: // We can further constrain the upper bound of bit compression if the number of bits // which can be set to 1 is less than the maximum number of bits of integral type. hi = MIN2((jlong)((1UL << result_bit_width) - 1L), hi); } test/hotspot/jtreg/compiler/c2/gvn/TestBitCompressValueTransform.java line 307: > 305: } > 306: Asserts.assertEQ(0, res); > 307: } Like I mentioned in the email. I contributed this test, it would be nice if you could give me credits by making me a contributor to this issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123857262 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123864703 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123881395 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123983125 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2123992004 From epeter at openjdk.org Tue Jun 3 14:27:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 14:27:01 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v8] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 17:43:27 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > We can further constrain the value range bounds of bit compression and expansion once PR #17508 gets integrated. For now, I have developed the following draft demonstrates bound constraining with KnownBitLattice. > > > // > // Prototype of bit compress/expand value range computation > // using KnownBits infrastructure. > // > > #include > #include > #include > #include > > template > class KnownBitsLattice { > private: > U zeros; > U ones; > > public: > KnownBitsLattice(U lb, U ub); > > U getKnownZeros() { > return zeros; > } > > U getKnownOnes() { > return ones; > } > > long getKnownZerosCount() { > uint64_t count = 0; > asm volatile ("popcntq %1, %0 \n\t" : "=r"(count) : "r"(zeros) : "cc"); > return count; > } > > long getKnownOnesCount() { > uint64_t count = 0; > asm volatile ("popcntq %1, %0 \n\t" : "=r"(count) : "r"(ones) : "cc"); > return count; > } > > bool check_voilation() { > // A given bit cannot be both zero or one. > return (zeros & ones) != 0; > } > > bool is_MSB_KnownOneBitsSet() { > return (ones >> 63) == 1; > } > > bool is_MSB_KnownZeroBitsSet() { > return (zeros >> 63) == 1; > } > }; > > template > KnownBitsLattice::KnownBitsLattice(U lb, U ub) { > // To find KnownBitsLattice from a given value range > // we first find the common prefix b/w upper and lower > // bound, we then concertize known zeros and ones bit > // based on common prefix. > // e.g. > // lb = 00110001 > // ub = 00111111 > // common prefix = 0011XXXX > // knownbits.zeros = 11000000 > // knownbits.ones = 00110000 > // > // conversely, for a give knownbits value we can find > // lower and upper value ranges. > // e.g. > // knownbits.zeros = 0x00010001 > // knownbits.ones = 0x10001100 > // range.lo = knownbits.ones, this is because knownbits.ones are > // guaranteed to be one. > // range.hi = ~knownbits.zeros, this is an optimistic upper bound > // which assumes all unset knownbits.zero > // are ones. > // Thus in above example, > // range.lo = 0x8C > // range.hi = 0xEE > > U lzcnt = 0; > U common_prefix = lb ^ ub; > asm volatile ("lzcntq %1, %0 \n\t" : "=r"(lzcnt) : "r"(common_prefix) : "cc"); > U common_prefix_mask = lzcnt == 0 ? 0xFFFFFFFFFFFFFFFFL : ~((1ULL << (64 - lzcnt)) - 1); > zeros = (~lb) & common_prefix_mask; > ones = (lb) & c... @jatin-bhateja I think we are making progress, it seems to me now that the VM code is correct, at least as far as I can tell with visual inspection. We are still missing some additional tests, as I have asked for a few times already: https://github.com/openjdk/jdk/pull/23947#issuecomment-2853896251 We should do something like this: public static test(int mask, int src) { mask = Math.max(CON1, Math.min(CON2, mask)); src = Math.max(CON2, Math.min(CON4, src)); result = Integer.compress(src, mask); int sum = 0; if (sum > LIMIT_1) { sum += 1; } if (sum > LIMIT_2) { sum += 2; } if (sum > LIMIT_3) { sum += 4; } if (sum > LIMIT_4) { sum += 8; } if (sum > LIMIT_5) { sum += 16; } if (sum > LIMIT_6) { sum += 32; } if (sum > LIMIT_7) { sum += 64; } if (sum > LIMIT_8) { sum += 128; } return new int[] {sum, result}; } You could do the same pattern for `expand` too. Then you pick random values using `Generators.java` for all the `CON` and `LIMIT`. If we somehow produce a bad range, then the limit checks could constant fold wrongly, and then the `sum` would reflect this wrong result. Optimal would be to duplicate this pattern, and have one method that compiles, and one that runs in interpreter. That way, you can repeatedly call the methods with various `src` and `mask` values, and compare the output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2935548411 From mdoerr at openjdk.org Tue Jun 3 14:34:06 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jun 2025 14:34:06 GMT Subject: RFR: 8354636: [PPC64] Clean up comments regarding frame manager Message-ID: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> Trivial comment cleanup: Replace "frame manager" by "template interpreter". ------------- Commit messages: - 8354636: [PPC64] Clean up comments regarding frame manager Changes: https://git.openjdk.org/jdk/pull/25616/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25616&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354636 Stats: 13 lines in 3 files changed: 0 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25616.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25616/head:pull/25616 PR: https://git.openjdk.org/jdk/pull/25616 From epeter at openjdk.org Tue Jun 3 14:36:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 14:36:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v77] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add some hashes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/fa3d086a..6ef71270 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=76 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=75-76 Stats: 12 lines in 2 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From chagedorn at openjdk.org Tue Jun 3 14:36:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Jun 2025 14:36:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v76] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 11:43:48 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rename View -> FilteredSet Thanks for the update! Almost there, some last comments and then we're good to go :-) test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 95: > 93: * where we would possibly want to make field or variable definitions during the insertion > 94: * that are not just local to the insertion but affect the {@link CodeFrame} that we > 95: * {@link Hook#anchor} earlier and are now {@link Hook#insert}ing into. It complains that `addName` cannot be found. Suggestion to use `{@link Template#addDataName}/ * {@link Template#addStructuralName}` instead: Suggestion: * Creates a special frame, which has a {@link #parent} but uses the {@link NameSet} * from the parent frame, allowing {@link Template#addDataName}/ * {@link Template#addStructuralName} to persist in the outer frame when the current frame * is exited. This is necessary for {@link Hook#insert}, where we would possibly want to * make field or variable definitions during the insertion that are not just local to the * insertion but affect the {@link CodeFrame} that we {@link Hook#anchor} earlier and are * now {@link Hook#insert}ing into. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 114: > 112: throw new RuntimeException("Internal error: Duplicate Hook in CodeFrame: " + hook.name()); > 113: } > 114: hookCodeLists.put(hook, new Code.CodeList(new ArrayList())); Suggestion: hookCodeLists.put(hook, new Code.CodeList(new ArrayList<>())); test/hotspot/jtreg/compiler/lib/template_framework/Hook.java line 26: > 24: package compiler.lib.template_framework; > 25: > 26: import java.util.List; Unused: Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/Name.java line 29: > 27: import java.util.Map; > 28: import java.util.ArrayList; > 29: import java.util.List; Unused: Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 89: > 87: if (w < 0) { > 88: throw new RuntimeException("Negative weight not allowed: " + w); > 89: } I thought zero is also not allowed? test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 32: > 30: import java.util.regex.Matcher; > 31: import java.util.regex.Pattern; > 32: import java.util.stream.Stream; Some are unused: Suggestion: import java.util.List; import java.util.regex.MatchResult; import java.util.regex.Matcher; import java.util.regex.Pattern; test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 358: > 356: // If the character was not found, we want to have the rest of the > 357: // String s, so instead of "-1" take the end/length of the String. > 358: dollar = (dollar == -1) ? s.length() : dollar; `s.length()` could be called once before the loop and then reused inside the loop. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 384: > 382: > 383: /** > 384: * We are parsing a part now. Befor the part, there was either a "#" or "$": Suggestion: * We are parsing a part now. Before the part, there was either a "#" or "$": ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2892355999 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123827132 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123829199 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123830659 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123806572 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123937058 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123820068 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124044533 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124038113 From chagedorn at openjdk.org Tue Jun 3 14:36:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Jun 2025 14:36:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v71] In-Reply-To: References: Message-ID: <8ZmNt-_xwKXhDPfLL6U7EaOD4F0IwDn_2A4KB7DRze4=.182761fc-deb6-417b-948e-2bbf54bf3dab@github.com> On Tue, 3 Jun 2025 05:49:26 GMT, Emanuel Peter wrote: >> My IDE advises against matching on the raw type `List`. As an alternative you can match on `List`. > > Done, I must have been tired yesterday afternoon ? Thanks! No worries! :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2123839122 From amitkumar at openjdk.org Tue Jun 3 14:50:51 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Jun 2025 14:50:51 GMT Subject: RFR: 8354636: [PPC64] Clean up comments regarding frame manager In-Reply-To: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> References: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> Message-ID: <8cSVDOqa_V2X0bF54lQRQTASvGyxuZSYiTf8kwTeb1k=.ad7a9b45-6836-4379-887a-60131e18d98a@github.com> On Tue, 3 Jun 2025 14:29:49 GMT, Martin Doerr wrote: > Trivial comment cleanup: Replace "frame manager" by "template interpreter". Looks good and trivial. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/25616#pullrequestreview-2892828163 From roland at openjdk.org Tue Jun 3 15:03:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 15:03:58 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Wed, 28 May 2025 08:34:27 GMT, Emanuel Peter wrote: > My (somewhat limited) experience with delaying optimizations is that this can be quite brittle. You need to get the condition just right, otherwise it just happens again in some generalized case again - maybe you check for 1 level, and later it happens with 2 or more layers. I don't disagree with that. > I'm half-understanding the example you present. Can you show the IR nodes for your last step: > > ``` > Store#195 -> AddP#516 -> AddP#544 -> CastPP#110 > -> CastPP#529 > ``` > > What exactly are the bases there? Your simplified drawings seem to show the flow of computation, but I cannot see what the bases are in it, right? You could enhance it, for example with `AddP#nnn(base:nnn)`. I think that would help me follow the example. In the example above, the `CastPP`s are the bases. So the simplified drawings mostly only show how the `AddP`s are chained and the bases. > Maybe some more full IR snippets could be helpful, maybe even IGV drawings. But that may be more work for you. I rarely use the IGV so, yeah, that would be more work. > I'm wondering if we could not have some other "cleanup" optimizations that fix up the bases. What are the assumptions about merging AddP's at a Phi? Is the base from before the Phi propagated to after the Phi? I'm missing some base understanding here to see through this ;) There is a cleanup already. It's `ConstraintCastNode::dominating_cast()`. It's run during igvn (but in the case of this failure igvn can't prove domination) and loop opts (but in the case of this failure, we are one pass of loop opts short of cleaning things up). So we would need an extra run of loop of opts which seems to be quite a bit of overhead for this sort of issues. That's why I went with the igvn delay fix even though it's fragile. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-2935784074 From roland at openjdk.org Tue Jun 3 15:03:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 15:03:59 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: <2q9RH_3nobpsee8aZzoqkkKPgUkjJuPCoH-LDV4roEs=.56385866-5158-45d2-826d-5ff8448284e9@github.com> On Wed, 28 May 2025 08:23:52 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2107: >> >>> 2105: } >>> 2106: return false; >>> 2107: } >> >> You check for a single level here. Could the same happen over multiple levels? > > If an update should come from further up, but has not propagated down? Right, possibly. I'm not 100% sure. I could check all the `Cast`s along the chain at the `Phi` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2124147201 From yzheng at openjdk.org Tue Jun 3 15:14:00 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Jun 2025 15:14:00 GMT Subject: RFR: 8358333: Use VEX2 prefix in Assembler::psllq In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:53:17 GMT, Yudi Zheng wrote: > While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 > https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 > > > c4 e1 c1 73 f7 34 > > > By setting the rex_w to WIG, the emitted bytes are: > > > c5 c1 73 f7 34 Thanks for the review! Passed tier1-3, hs-precheckin-comp, hs-comp-stress ------------- PR Comment: https://git.openjdk.org/jdk/pull/25593#issuecomment-2935865077 From yzheng at openjdk.org Tue Jun 3 15:14:01 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Jun 2025 15:14:01 GMT Subject: Integrated: 8358333: Use VEX2 prefix in Assembler::psllq In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:53:17 GMT, Yudi Zheng wrote: > While porting the commit https://github.com/openjdk/jdk/commit/0df8c9684b8782ef830e2bd425217864c3f51784 to Graal, I noticed that the Assembler::psllq instruction is using the VEX3 prefix. This results in the instruction being unrecognizable by my outdated version of hsdis. Currently, HotSpot generates the following bytes for vpsllq xmm7, xmm7, 0x34 > https://github.com/openjdk/jdk/blob/0df8c9684b8782ef830e2bd425217864c3f51784/src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp#L255 > > > c4 e1 c1 73 f7 34 > > > By setting the rex_w to WIG, the emitted bytes are: > > > c5 c1 73 f7 34 This pull request has now been integrated. Changeset: faf19abd Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/faf19abd312ac461f9f74035fec61af7d834ffc1 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8358333: Use VEX2 prefix in Assembler::psllq Reviewed-by: jbhateja, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25593 From roland at openjdk.org Tue Jun 3 15:21:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 15:21:52 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v9] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/library_call.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/43c6f822..c0a8ad21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=07-08 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Tue Jun 3 15:21:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Jun 2025 15:21:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> Message-ID: On Tue, 27 May 2025 09:08:02 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Is the copyright year accurate? It's your test that I took over and updated so you tell me: do you want the copyright updated? > test/hotspot/jtreg/compiler/macronodes/TestInitializingStoreCapturing.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Is the copyright year accurate? Same comment as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2124206114 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2124207503 From epeter at openjdk.org Tue Jun 3 15:27:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:27:23 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v76] In-Reply-To: References: Message-ID: <0YLlUg2GzBB8eo4d7V8-NZz_J7ZjGLqpOrNzxpiqVd0=.3cda28b0-5de0-4872-a9e4-4c10dba056a9@github.com> On Tue, 3 Jun 2025 13:30:47 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename View -> FilteredSet > > test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 95: > >> 93: * where we would possibly want to make field or variable definitions during the insertion >> 94: * that are not just local to the insertion but affect the {@link CodeFrame} that we >> 95: * {@link Hook#anchor} earlier and are now {@link Hook#insert}ing into. > > It complains that `addName` cannot be found. Suggestion to use `{@link Template#addDataName}/ > * {@link Template#addStructuralName}` instead: > > Suggestion: > > * Creates a special frame, which has a {@link #parent} but uses the {@link NameSet} > * from the parent frame, allowing {@link Template#addDataName}/ > * {@link Template#addStructuralName} to persist in the outer frame when the current frame > * is exited. This is necessary for {@link Hook#insert}, where we would possibly want to > * make field or variable definitions during the insertion that are not just local to the > * insertion but affect the {@link CodeFrame} that we {@link Hook#anchor} earlier and are > * now {@link Hook#insert}ing into. good catch! I got no complaints because `javadoc` does not look at private classes ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124230818 From epeter at openjdk.org Tue Jun 3 15:36:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:36:03 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v78] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/6ef71270..3efb9fc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=77 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=76-77 Stats: 18 lines in 4 files changed: 1 ins; 10 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue Jun 3 15:36:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:36:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v76] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:05:14 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename View -> FilteredSet > > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 89: > >> 87: if (w < 0) { >> 88: throw new RuntimeException("Negative weight not allowed: " + w); >> 89: } > > I thought zero is also not allowed? Well that should have been filtered out already earlier, when we added the individual names. Now we could have a total weight of zero if we have no names. Then we just return `null` here, and then throw an exception a little further up the use case chain, e.g. `DataName.sample` -> `throw new RendererException("No variable: " + mutability + msg1 + msg2 + ".");` This here is just a sanity check, hence I throw a `RuntimeException`, and not a `RendererException`. > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 358: > >> 356: // If the character was not found, we want to have the rest of the >> 357: // String s, so instead of "-1" take the end/length of the String. >> 358: dollar = (dollar == -1) ? s.length() : dollar; > > `s.length()` could be called once before the loop and then reused inside the loop. You mean as a performance optimization? Is that not something we let the compiler do? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124241074 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124247590 From asmehra at openjdk.org Tue Jun 3 15:36:22 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 3 Jun 2025 15:36:22 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor [v2] In-Reply-To: References: Message-ID: <6yLBtrUKBPgV63susOsKKhAPYCofyOI_Yd0wqbSqrCU=.12d4c0ca-0fa6-4000-a5e1-3ffd0f2ea6cc@github.com> > This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. See description of [JDK-8358330](https://bugs.openjdk.org/browse/JDK-8358330) for more details. Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8358330 - Address review comments Signed-off-by: Ashutosh Mehra - 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25598/files - new: https://git.openjdk.org/jdk/pull/25598/files/6869d630..889286b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25598&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25598&range=00-01 Stats: 7988 lines in 205 files changed: 4294 ins; 1061 del; 2633 mod Patch: https://git.openjdk.org/jdk/pull/25598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25598/head:pull/25598 PR: https://git.openjdk.org/jdk/pull/25598 From asmehra at openjdk.org Tue Jun 3 15:36:23 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 3 Jun 2025 15:36:23 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 01:29:04 GMT, Vladimir Kozlov wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8358330 >> - Address review comments >> >> Signed-off-by: Ashutosh Mehra >> - 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor >> >> Signed-off-by: Ashutosh Mehra > > I am not comfortable that you are changing code not used by AOT. > Can you consider populate `CodeBlob::_asm_remarks` and `_dbg_strings` after calling `CodeBlob::create()`? Then you don't need temporary `AsmRemarks` and `DbgStrings`. @vnkozlov I have updated the changes. Can you please review again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25598#issuecomment-2936000546 From never at openjdk.org Tue Jun 3 15:52:03 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 3 Jun 2025 15:52:03 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Mon, 2 Jun 2025 17:01:26 GMT, Evgeny Astigeevich wrote: > If it is moved, the [CompiledMethodUnload](https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodUnload) event is sent, followed by a new CompiledMethodLoad event. > we now have 2 nmethods alive with the same compile_id which could be confusing. It's nice that the JVMTI docs considered this problem but the notifications will be sent in the reverse order given our current implementation. We will create a new nmethod while the old nmethod might still be alive, at least for the purposes of deopt. Since this PR doesn't actually perform any relocation, I'm not sure what the plan is here. The most aggressive thing that could be done is to invalidate all frames which have the old nmethod on stack, but that still leaves the nmethod live for the purposes of deopt. It would probably be ok to synthesize an unload after the deopt since there should be no actual execution in those nmethods, but you will then have to suppress the one that's normally done during nmethod::unlink. I agree that the docs are fairly clear that all of this is ok, but that doesn't mean that assumptions haven't been made about the current implementation. We just need to make sure we do something rational and that it's possible to understand from our output what was done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2936067701 From chagedorn at openjdk.org Tue Jun 3 15:53:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Jun 2025 15:53:00 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:15:31 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! @jaskarth Just to let you know, the fork is coming up this Thursday. But since this is a P3, we still got some time left in RDP 1 to get this fixed in JDK 25. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2936069655 From epeter at openjdk.org Tue Jun 3 15:53:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:53:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v79] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3efb9fc6..3f0beb4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=78 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=77-78 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue Jun 3 15:53:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:53:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <63rjQWH6SMsAL2gNeZEmKADrHdr1BCf27oToad-qn2c=.32494c95-4bc4-4e34-bbb1-18c6c5020ce7@github.com> On Mon, 2 Jun 2025 12:14:48 GMT, Christian Hagedorn wrote: >> Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! >> >> I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) > >> @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. >> >> These are issues we could continue the conversation, unless you are satisfied with my answers: [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737) [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391) >> >> This is now ready for another review pass ? > > Awesome, thanks for spending some more time with these nasty edge-cases and finding a solution! I had a look at your updates for all my comments, they look good, thanks! > > I'm going to make a pass over the implementation classes now and will have a look at the `Renderer` updates as well :-) @chhagedorn Thanks for this batch of comments, they are all addressed! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2936069153 From epeter at openjdk.org Tue Jun 3 15:53:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:53:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v76] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 15:27:23 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 89: >> >>> 87: if (w < 0) { >>> 88: throw new RuntimeException("Negative weight not allowed: " + w); >>> 89: } >> >> I thought zero is also not allowed? > > Well that should have been filtered out already earlier, when we added the individual names. Now we could have a total weight of zero if we have no names. Then we just return `null` here, and then throw an exception a little further up the use case chain, e.g. `DataName.sample` -> `throw new RendererException("No variable: " + mutability + msg1 + msg2 + ".");` > > This here is just a sanity check, hence I throw a `RuntimeException`, and not a `RendererException`. As asked for offline: added some more comments here. >> test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 358: >> >>> 356: // If the character was not found, we want to have the rest of the >>> 357: // String s, so instead of "-1" take the end/length of the String. >>> 358: dollar = (dollar == -1) ? s.length() : dollar; >> >> `s.length()` could be called once before the loop and then reused inside the loop. > > You mean as a performance optimization? Is that not something we let the compiler do? As discussed offline: I made `s` final, so we are sure that it is not mutated and the length should be moved out of the loop by the compiler. It would also only be a very small performance impact, as we are doing things like `indexOf` here, which are much more expensive anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124296212 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2124298706 From kvn at openjdk.org Tue Jun 3 15:56:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 15:56:54 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Looks fine. I submitted testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/24972#pullrequestreview-2893164969 From epeter at openjdk.org Tue Jun 3 15:57:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 15:57:32 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v80] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix whitespaces from applied suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3f0beb4a..72923879 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=79 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=78-79 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From kvn at openjdk.org Tue Jun 3 16:17:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 16:17:56 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v3] In-Reply-To: References: Message-ID: <7O4QTc1_uAcjWhyauZKWf0E1nwun5aq64sRDOFpB_YY=.1a0e7491-3d0a-458a-9ee0-caaf8c0217ee@github.com> On Mon, 2 Jun 2025 22:49:36 GMT, Cesar Soares Lucas wrote: >> Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. >> >> Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback. src/hotspot/share/code/nmethod.cpp line 2131: > 2129: ResourceMark rm; > 2130: const char* name = method()->name()->as_C_string(); > 2131: const char* is_jvmci = ""; Please use `compiler_name()` instead. src/hotspot/share/code/nmethod.cpp line 2142: > 2140: } > 2141: #endif > 2142: log_debug(codecache)("Flushing nmethod %3d/" INTPTR_FORMAT ", level=%d, osr=%d, cold=%d, epoch=" UINT64_FORMAT ", cold_count=" UINT64_FORMAT ". " You can use `lt` here. src/hotspot/share/code/nmethod.cpp line 2145: > 2143: "Cache capacity: %zuKb, free space: %zuKb. %smethod %s", > 2144: _compile_id, p2i(this), _comp_level, is_osr_method(), is_cold(), _gc_epoch, CodeCache::cold_gc_count(), > 2145: codecache_capacity, codecache_free_space, is_jvmci, name); Swap `is_jvmci` and `name` so that method's names in output for nmethod not compiled by JVMCI (by C1) will be aligned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25402#discussion_r2124355185 PR Review Comment: https://git.openjdk.org/jdk/pull/25402#discussion_r2124324589 PR Review Comment: https://git.openjdk.org/jdk/pull/25402#discussion_r2124331010 From kvn at openjdk.org Tue Jun 3 16:28:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 16:28:56 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:38:09 GMT, Aleksey Shipilev wrote: >> Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. >> >> Also tidied up some comments around it. >> >> Additional testing; >> - [x] Linux x86_64 server fastdebug, `tier1 tier2` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Stale comment > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Fix Re-approved. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25343#pullrequestreview-2893267613 From kvn at openjdk.org Tue Jun 3 16:41:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 16:41:51 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor [v2] In-Reply-To: <6yLBtrUKBPgV63susOsKKhAPYCofyOI_Yd0wqbSqrCU=.12d4c0ca-0fa6-4000-a5e1-3ffd0f2ea6cc@github.com> References: <6yLBtrUKBPgV63susOsKKhAPYCofyOI_Yd0wqbSqrCU=.12d4c0ca-0fa6-4000-a5e1-3ffd0f2ea6cc@github.com> Message-ID: On Tue, 3 Jun 2025 15:36:22 GMT, Ashutosh Mehra wrote: >> This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. See description of [JDK-8358330](https://bugs.openjdk.org/browse/JDK-8358330) for more details. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8358330 > - Address review comments > > Signed-off-by: Ashutosh Mehra > - 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor > > Signed-off-by: Ashutosh Mehra Good. Let me test it. ------------- PR Review: https://git.openjdk.org/jdk/pull/25598#pullrequestreview-2893307151 From shade at openjdk.org Tue Jun 3 16:43:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 16:43:36 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:38:09 GMT, Aleksey Shipilev wrote: >> Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. >> >> Also tidied up some comments around it. >> >> Additional testing; >> - [x] Linux x86_64 server fastdebug, `tier1 tier2` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Stale comment > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Fix Still fine with it, @iwanowww? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25343#issuecomment-2936257590 From jbhateja at openjdk.org Tue Jun 3 17:03:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 17:03:58 GMT Subject: RFR: 8351635: C2 ROR/ROL: assert failed: Long constant expected [v2] In-Reply-To: References: <5k2J6AUT-a3B006J_ksxccQVxprZa21uqUbKTGkkby0=.5dfc4f2b-ad7a-4393-bf5e-efc246582c83@github.com> Message-ID: On Tue, 3 Jun 2025 11:42:07 GMT, Tobias Hartmann wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/vectorapi/TestVectorRotateScalarCount.java >> >> Co-authored-by: Tobias Hartmann > > Looks good to me. All tests passed. Thanks @TobiHartmann , @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/25493#issuecomment-2936325533 From jbhateja at openjdk.org Tue Jun 3 17:04:00 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 17:04:00 GMT Subject: Integrated: 8351635: C2 ROR/ROL: assert failed: Long constant expected In-Reply-To: References: Message-ID: <5VS9_aH-pNteS0lAJK8NdwbhEqbguLwAnkjQwgX0dRg=.c7692617-7b17-45a4-866e-73eeab9887c8@github.com> On Wed, 28 May 2025 14:19:21 GMT, Jatin Bhateja wrote: > This bug fix patch relaxes the strict assertion check to allow other pattern matches for degenerated long vector ROL/ROR operations with non-constant scalar shift values. > > Kindly review and share feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: d7e58ac4 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/d7e58ac480b06c6340a65e67731d8f6dc179acfb Stats: 129 lines in 2 files changed: 127 ins; 1 del; 1 mod 8351635: C2 ROR/ROL: assert failed: Long constant expected Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25493 From epeter at openjdk.org Tue Jun 3 17:05:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 17:05:02 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> Message-ID: On Tue, 3 Jun 2025 15:17:35 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. >> >> Is the copyright year accurate? > > It's your test that I took over and updated so you tell me: do you want the copyright updated? Ah right. Well ? I guess you could write `2024, 2025` then :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2124444173 From epeter at openjdk.org Tue Jun 3 17:13:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 17:13:55 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v3] In-Reply-To: <1jna58ZtxrGgcqNt9FQf5Tl-rIo6YwTFYzavusVZGyA=.87513e10-77ba-436e-9d9e-b82f5041d368@github.com> References: <1jna58ZtxrGgcqNt9FQf5Tl-rIo6YwTFYzavusVZGyA=.87513e10-77ba-436e-9d9e-b82f5041d368@github.com> Message-ID: On Tue, 3 Jun 2025 08:19:06 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/x86_64.ad > > Thanks :-) > > Co-authored-by: Tobias Hartmann @jatin-bhateja Thanks for looking into this! `predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1);` The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? Are these not cases like a ^ -1, which basically flips all bits. What alternative does this end up using now? A code comment would be helpful. src/hotspot/cpu/x86/x86_64.ad line 10620: > 10618: instruct xorI_rReg_imm(rRegI dst, immI src, rFlagsReg cr) > 10619: %{ > 10620: predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? ------------- PR Review: https://git.openjdk.org/jdk/pull/25501#pullrequestreview-2893385310 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2124452416 From epeter at openjdk.org Tue Jun 3 17:23:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 17:23:57 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 22:43:35 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > renaming src/hotspot/share/opto/c2_globals.hpp line 83: > 81: \ > 82: product(bool, StressReachabilityFences, false, DIAGNOSTIC, \ > 83: "Randomly insert ReachabilityFence nodes") \ Drive-by sniping: what about a hello-world test where you test out these flags? test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 38: > 36: * @summary Tests to ensure that reachabilityFence() correctly keeps objects from being collected prematurely. > 37: * @modules java.base/jdk.internal.misc > 38: * @run main/othervm -Xbatch compiler.c2.TestReachabilityFence What about some extra runs where you use your new flags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2124474939 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2124476770 From jbhateja at openjdk.org Tue Jun 3 17:24:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 17:24:52 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v3] In-Reply-To: References: <1jna58ZtxrGgcqNt9FQf5Tl-rIo6YwTFYzavusVZGyA=.87513e10-77ba-436e-9d9e-b82f5041d368@github.com> Message-ID: On Tue, 3 Jun 2025 17:07:30 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/cpu/x86/x86_64.ad >> >> Thanks :-) >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/cpu/x86/x86_64.ad line 10620: > >> 10618: instruct xorI_rReg_imm(rRegI dst, immI src, rFlagsReg cr) >> 10619: %{ >> 10620: predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); > > The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? Hey, AD file change enables AndN inferening, test expects compiler to emit that instruction and has hardcoded encoding checks in place to verify it. So both encoding and AD file changes are necessary to fix this failing test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2124477130 From chagedorn at openjdk.org Tue Jun 3 17:39:31 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Jun 2025 17:39:31 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v80] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 15:57:32 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespaces from applied suggestion Thanks a lot for addressing everything and all the interesting and insightful discussions - also learnt quite a lot :-) There is nothing left to say other than: Ship it! ? (if the other reviewers also agree with the latest changes) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2893451928 From sviswanathan at openjdk.org Tue Jun 3 17:49:34 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Jun 2025 17:49:34 GMT Subject: RFR: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input In-Reply-To: References: Message-ID: <2EnHipwji6WU4sYmsbJAZSGpSmbhREXUq9f7V-ka6AI=.e5f13037-a796-41b9-8feb-a5ffcb9bc1b6@github.com> On Mon, 2 Jun 2025 11:53:23 GMT, Jatin Bhateja wrote: > Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. > Test mentioned in the bug report has been included allong with the patch. > > Kindly review. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25586#pullrequestreview-2893459874 From jbhateja at openjdk.org Tue Jun 3 17:49:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 17:49:37 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: Message-ID: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> On Tue, 3 Jun 2025 02:43:42 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > Thanks, encoding logic is concentrated in integral instruction tests and is shared with corresponding long variants, extended APX coverage for BLS/R/MSK. > @jatin-bhateja Thanks for looking into this! > > `predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1);` > > The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? > > Are these not cases like a ^ -1, which basically flips all bits. What alternative does this end up using now? > > A code comment would be helpful. We are tightening the predicate check so that under no circumstances we pick this pattern during the reduction phase of instruction selection on account of having lower cost. There is a generic pattern (xorI_rReg_imm) for all integral immediate values, and then there is a special pattern for Xor with -1 (fxorI_rReg_im1), which is needed for AndN inferencing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2936412986 From duke at openjdk.org Tue Jun 3 17:52:30 2025 From: duke at openjdk.org (Tom Shull) Date: Tue, 3 Jun 2025 17:52:30 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v4] In-Reply-To: References: Message-ID: <5OVd27HKtqnWu4vn0VnDAWLdWk0iTEstxqhnt9XJ5xU=.efb8b1eb-9998-4caa-844d-e8af7765d3b2@github.com> > This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolve()` > 2. `JavaConstant lookup()` > > The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8357660 - commit to trigger testing - commit to trigger testing - reviewer feedback and update javadoc formatting - complete changes - commit review suggestion Co-authored-by: Douglas Simon - commit review suggestion Co-authored-by: Douglas Simon - change to allow both indys and condys to be looked up all at once - address reviewer feedback - style fixes and add testing to TestDynamicConstants. - ... and 1 more: https://git.openjdk.org/jdk/compare/7bf6d3ed...e0707fb8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25420/files - new: https://git.openjdk.org/jdk/pull/25420/files/4d508fc4..e0707fb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=02-03 Stats: 63637 lines in 1081 files changed: 34781 ins; 18003 del; 10853 mod Patch: https://git.openjdk.org/jdk/pull/25420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25420/head:pull/25420 PR: https://git.openjdk.org/jdk/pull/25420 From duke at openjdk.org Tue Jun 3 17:52:39 2025 From: duke at openjdk.org (Tom Shull) Date: Tue, 3 Jun 2025 17:52:39 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v4] In-Reply-To: References: Message-ID: > Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. > > To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8357987 - return List.of() from getAllMethods - format javadoc and update test - implement getAllMethods - address reviewer feedback - Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25498/files - new: https://git.openjdk.org/jdk/pull/25498/files/ae81d46f..606f3619 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=02-03 Stats: 63826 lines in 1089 files changed: 34970 ins; 18003 del; 10853 mod Patch: https://git.openjdk.org/jdk/pull/25498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25498/head:pull/25498 PR: https://git.openjdk.org/jdk/pull/25498 From kvn at openjdk.org Tue Jun 3 17:52:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 17:52:47 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> On Tue, 27 May 2025 17:26:59 GMT, Manuel H?ssig wrote: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... src/hotspot/cpu/x86/peephole_x86_64.cpp line 244: > 242: // the DecodeN. However, after matching the DecodeN is added back as the base for the leaP*, > 243: // which is nessecary if the oop derived by the leaP* gets added to an OopMap, because OopMaps > 244: // cannot contain derived oops with narrow oops as a base. Am I correct to assume that if it is referenced in OopMap (which is side table) it will by referenced by some Safepoint node in graph? src/hotspot/cpu/x86/peephole_x86_64.cpp line 255: > 253: // This peephole recognizes graphs of the shape as shown above, ensures that the result of the > 254: // decode is only used by the derived oop and removes that decode if this is the case. Futher, > 255: // multipe leaP*s can have the same decode as their base. This peephole will remove the decode Typo `multipe` src/hotspot/cpu/x86/peephole_x86_64.cpp line 267: > 265: // | / \ > 266: // leaP* MachProj (leaf) > 267: // In this case where te common parent of the leaP* and the decode is one MemToRegSpill Copy Typo: `te` src/hotspot/cpu/x86/peephole_x86_64.cpp line 268: > 266: // leaP* MachProj (leaf) > 267: // In this case where te common parent of the leaP* and the decode is one MemToRegSpill Copy > 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy Typo: `als` src/hotspot/cpu/x86/peephole_x86_64.cpp line 270: > 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy > 269: // if that is only used by the decode. > 270: bool Peephole::lea_remove_redundant(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_, Why do you need `_` suffix? src/hotspot/cpu/x86/peephole_x86_64.cpp line 324: > 322: > 323: // Ensure the MachProj is in the same block as the decode and the lea. > 324: if (!block->contains(proj)) { Should we check `proj == nullptr` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124504924 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124512685 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124513810 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124514534 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124516088 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124528016 From jbhateja at openjdk.org Tue Jun 3 18:07:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 3 Jun 2025 18:07:34 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v4] In-Reply-To: References: Message-ID: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/b5f69c8d..e332f191 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=02-03 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From cslucas at openjdk.org Tue Jun 3 18:52:45 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 18:52:45 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v4] In-Reply-To: References: Message-ID: > Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. > > Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Refactoring: use compiler_name(), LogStream ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25402/files - new: https://git.openjdk.org/jdk/pull/25402/files/2aabfa72..489d8eee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=02-03 Stats: 14 lines in 1 file changed: 0 ins; 8 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25402/head:pull/25402 PR: https://git.openjdk.org/jdk/pull/25402 From shade at openjdk.org Tue Jun 3 18:52:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 18:52:45 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:47:37 GMT, Cesar Soares Lucas wrote: >> Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. >> >> Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring: use compiler_name(), LogStream Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25402#pullrequestreview-2893698653 From never at openjdk.org Tue Jun 3 19:22:22 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 3 Jun 2025 19:22:22 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 17:52:39 GMT, Tom Shull wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8357987 > - return List.of() from getAllMethods > - format javadoc and update test > - implement getAllMethods > - address reviewer feedback > - Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25498#pullrequestreview-2893798998 From duke at openjdk.org Tue Jun 3 19:38:19 2025 From: duke at openjdk.org (duke) Date: Tue, 3 Jun 2025 19:38:19 GMT Subject: RFR: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 17:52:39 GMT, Tom Shull wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8357987 > - return List.of() from getAllMethods > - format javadoc and update test > - implement getAllMethods > - address reviewer feedback > - Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. @teshull Your change (at version 606f36196a7bd12abfc76c55b141d712cc613f42) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25498#issuecomment-2936879630 From kvn at openjdk.org Tue Jun 3 19:39:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 19:39:20 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:52:45 GMT, Cesar Soares Lucas wrote: >> Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. >> >> Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring: use compiler_name(), LogStream Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25402#pullrequestreview-2893842755 From duke at openjdk.org Tue Jun 3 19:41:22 2025 From: duke at openjdk.org (Tom Shull) Date: Tue, 3 Jun 2025 19:41:22 GMT Subject: Integrated: 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType In-Reply-To: References: Message-ID: On Wed, 28 May 2025 15:55:39 GMT, Tom Shull wrote: > Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. > > To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. This pull request has now been integrated. Changeset: e235b61a Author: Tom Shull Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/e235b61a8bb70462921c09d197adc4b60267d327 Stats: 103 lines in 11 files changed: 102 ins; 0 del; 1 mod 8357987: [JVMCI] Add support for retrieving all methods of a ResolvedJavaType Reviewed-by: dnsimon, yzheng, never ------------- PR: https://git.openjdk.org/jdk/pull/25498 From cslucas at openjdk.org Tue Jun 3 19:53:32 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 19:53:32 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported Message-ID: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. Tested with JTREG tier1-3 and Renaissance on Linux x64. ------------- Commit messages: - Bailout of Conv2BNode::ideal on unknown input type. Changes: https://git.openjdk.org/jdk/pull/25627/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25627&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358534 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25627.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25627/head:pull/25627 PR: https://git.openjdk.org/jdk/pull/25627 From shade at openjdk.org Tue Jun 3 20:02:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 20:02:16 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> Message-ID: On Tue, 3 Jun 2025 19:48:37 GMT, Cesar Soares Lucas wrote: > `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. > > Tested with JTREG tier1-3 and Renaissance on Linux x64. Looks good to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25627#pullrequestreview-2893932492 From shade at openjdk.org Tue Jun 3 20:21:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 20:21:24 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> Message-ID: On Tue, 3 Jun 2025 19:48:37 GMT, Cesar Soares Lucas wrote: > `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. > > Tested with JTREG tier1-3 and Renaissance on Linux x64. Wait, hold on. The rule is to wait for 24 hours for Hotspot changes and have at least 2 Reviewers. Unless the change is trivial. This one is simple, but not trivial. So stand by if anyone would ask to back it out. @vnkozlov, @TobiHartmann -- FYI, there was a process snag, keep an eye on testing. ^^^ ------------- PR Comment: https://git.openjdk.org/jdk/pull/25627#issuecomment-2937056595 PR Comment: https://git.openjdk.org/jdk/pull/25627#issuecomment-2937060106 From cslucas at openjdk.org Tue Jun 3 20:21:24 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 20:21:24 GMT Subject: Integrated: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> Message-ID: On Tue, 3 Jun 2025 19:48:37 GMT, Cesar Soares Lucas wrote: > `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. > > Tested with JTREG tier1-3 and Renaissance on Linux x64. This pull request has now been integrated. Changeset: 704b5990 Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/704b5990a750719ca927e156553db7982637e590 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/25627 From mdoerr at openjdk.org Tue Jun 3 20:53:21 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jun 2025 20:53:21 GMT Subject: RFR: 8354636: [PPC64] Clean up comments regarding frame manager In-Reply-To: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> References: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> Message-ID: On Tue, 3 Jun 2025 14:29:49 GMT, Martin Doerr wrote: > Trivial comment cleanup: Replace "frame manager" by "template interpreter". Thanks for the review! GHA for Windows needs update. (Known issue.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25616#issuecomment-2937154838 From epeter at openjdk.org Tue Jun 3 21:10:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Jun 2025 21:10:38 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <2d4uleP_nUgeF02l9KHzJMsoBNLfp0IrXyoZnTm4CXY=.274f4ead-4028-458e-ade8-148a79d2f8c8@github.com> On Mon, 2 Jun 2025 12:14:48 GMT, Christian Hagedorn wrote: >> Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! >> >> I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) > >> @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. >> >> These are issues we could continue the conversation, unless you are satisfied with my answers: [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737) [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391) >> >> This is now ready for another review pass ? > > Awesome, thanks for spending some more time with these nasty edge-cases and finding a solution! I had a look at your updates for all my comments, they look good, thanks! > > I'm going to make a pass over the implementation classes now and will have a look at the `Renderer` updates as well :-) @chhagedorn Thank you very much! ? This was surely my biggest patch, and most deeply reviewed. An intense but rewarding experience. And I really learned so much from so many of the contributors here :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2937209219 From duke at openjdk.org Tue Jun 3 21:52:22 2025 From: duke at openjdk.org (duke) Date: Tue, 3 Jun 2025 21:52:22 GMT Subject: Withdrawn: 8344116: C2: remove slice parameter from LoadNode::make In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:18:25 GMT, Zihao Lin wrote: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24258 From cslucas at openjdk.org Tue Jun 3 23:42:21 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 23:42:21 GMT Subject: Integrated: 8357600: Patch nmethod flushing message to include more details In-Reply-To: References: Message-ID: On Thu, 22 May 2025 22:40:51 GMT, Cesar Soares Lucas wrote: > Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. > > Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. This pull request has now been integrated. Changeset: 23450651 Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/2345065166c56a958365a6362af356e7c95fcaff Stats: 13 lines in 1 file changed: 9 ins; 0 del; 4 mod 8357600: Patch nmethod flushing message to include more details Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25402 From kvn at openjdk.org Wed Jun 4 00:57:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 00:57:20 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Testing mostly passed. Only macOS-aarch64 left (linux-aarch64 passed). I think it is fine to integrate without waiting. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24972#pullrequestreview-2894516007 From fyang at openjdk.org Wed Jun 4 01:40:22 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 4 Jun 2025 01:40:22 GMT Subject: RFR: 8358105: RISC-V: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Thu, 29 May 2025 11:05:04 GMT, Anjian Wen wrote: > The reason of this patch is same as the x86 and aarch64 but for riscv > [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. Thanks. My local tier1-2 test on linux-riscv64 is clean. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25520#pullrequestreview-2894576442 From wenanjian at openjdk.org Wed Jun 4 01:56:22 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 4 Jun 2025 01:56:22 GMT Subject: RFR: 8358105: RISC-V: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:27:39 GMT, Feilong Jiang wrote: >> The reason of this patch is same as the x86 and aarch64 but for riscv >> [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) >> [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) >> >>> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > Looks good! @feilongjiang @RealFYang Thanks for your review and comments? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25520#issuecomment-2938043407 From duke at openjdk.org Wed Jun 4 01:56:22 2025 From: duke at openjdk.org (duke) Date: Wed, 4 Jun 2025 01:56:22 GMT Subject: RFR: 8358105: RISC-V: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Thu, 29 May 2025 11:05:04 GMT, Anjian Wen wrote: > The reason of this patch is same as the x86 and aarch64 but for riscv > [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. @Anjian-Wen Your change (at version 3dcba0d22bb3747b6ab3590c42a0b07e80a3555b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25520#issuecomment-2938044093 From wenanjian at openjdk.org Wed Jun 4 02:06:21 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 4 Jun 2025 02:06:21 GMT Subject: Integrated: 8358105: RISC-V: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Thu, 29 May 2025 11:05:04 GMT, Anjian Wen wrote: > The reason of this patch is same as the x86 and aarch64 but for riscv > [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. This pull request has now been integrated. Changeset: 939521b8 Author: Anjian Wen Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/939521b8e4120357108220d177228b683af3334f Stats: 33 lines in 2 files changed: 0 ins; 21 del; 12 mod 8358105: RISC-V: Optimize interpreter profile updates Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/25520 From vlivanov at openjdk.org Wed Jun 4 03:25:22 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 4 Jun 2025 03:25:22 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:38:09 GMT, Aleksey Shipilev wrote: >> Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. >> >> Also tidied up some comments around it. >> >> Additional testing; >> - [x] Linux x86_64 server fastdebug, `tier1 tier2` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Stale comment > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Fix Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25343#pullrequestreview-2894916789 From kvn at openjdk.org Wed Jun 4 04:08:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 04:08:24 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: <7wXlPpo5ZSbeh3RmNIKUQaum4UIAg-pINoVUixEFRvw=.53242223-85cb-426d-bd45-9b846ce472aa@github.com> On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix All my testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24972#issuecomment-2938390939 From jkarthikeyan at openjdk.org Wed Jun 4 04:28:38 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Jun 2025 04:28:38 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Reformat, add comments and char tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/8d1a8174..da692994 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=00-01 Stats: 144 lines in 2 files changed: 136 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Wed Jun 4 04:28:39 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Jun 2025 04:28:39 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:56:17 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Reformat, add comments and char tests > > src/hotspot/share/opto/superword.cpp line 2496: > >> 2494: int opc = in->Opcode(); >> 2495: return opc == Op_AddI || opc == Op_SubI || opc == Op_MulI || opc == Op_AndI || opc == Op_OrI || opc == Op_XorI >> 2496: || opc == Op_ReverseBytesS || opc == Op_ReverseBytesUS; > > Are you sure? I don't think you can truncate a `ReverseByteS` to a `byte`. This is a good observation, thank you! I've fixed it so that it checks for short/char in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125605047 From jkarthikeyan at openjdk.org Wed Jun 4 04:28:39 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Jun 2025 04:28:39 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 07:33:36 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Reformat, add comments and char tests > > src/hotspot/share/opto/superword.cpp line 2553: > >> 2551: const Type* vt = vtn; >> 2552: int op = in->Opcode(); >> 2553: if (!can_subword_truncate(in)) { > > It seems `can_subword_truncate` does not cover `VectorNode::is_shift_opcode`, is that correct? Maybe we are missing IR tests that catch this, scary! In this case since the condition is negated, the old condition should still be true since it falls through to the `return false` at the end of the function. Previously, it checked for a small list of nodes as requiring truncation handling which allows nodes that were not whitelisted to slip through and produce incorrect code. Now, we check for a larger group of nodes that we know do not need handling for truncation, so that any nodes not whitelisted will still produce valid code, but will not vectorize (until #23413). > test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 64: > >> 62: >> 63: // Shorts >> 64: > > Suggestion: > > > Nit: you don't have a similar comment for other types, so just drop it here too ;) Further in the file I used `// Bytes` to separate the byte methods, as well as the new char methods I added in the new commit. I think it helps navigating the file at a glance, at least in my emacs editor :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125606868 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125607475 From jkarthikeyan at openjdk.org Wed Jun 4 04:34:15 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Jun 2025 04:34:15 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Wed, 28 May 2025 07:46:12 GMT, Emanuel Peter wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > And just for good measure: should we also add tests for `char`? Thanks a lot for your review @eme64! I've pushed a commit that should address the reviews, and fix the GHA failures. I've added char tests as well. Regarding long operations, I don't think it's possible for this code path to encounter them. Earlier in the function, this logic is guarded with `vtn->basic_type() == T_INT` so I think only integer nodes need to be added to the list. Let me know what you think! @chhagedorn Thanks for the reminder! It might be good to run some testing so that we can get it tested and reviewed before the RDP1 cutoff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2938450410 From jkarthikeyan at openjdk.org Wed Jun 4 04:34:16 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Jun 2025 04:34:16 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: <9d8EW1k2YAwyyeLvIG5Fnqpjx-3PdrnBq_bildM8jsE=.a3fd55f7-5586-4302-8a14-c2d251cf6fe4@github.com> References: <9d8EW1k2YAwyyeLvIG5Fnqpjx-3PdrnBq_bildM8jsE=.a3fd55f7-5586-4302-8a14-c2d251cf6fe4@github.com> Message-ID: On Wed, 28 May 2025 07:41:09 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2496: >> >>> 2494: int opc = in->Opcode(); >>> 2495: return opc == Op_AddI || opc == Op_SubI || opc == Op_MulI || opc == Op_AndI || opc == Op_OrI || opc == Op_XorI >>> 2496: || opc == Op_ReverseBytesS || opc == Op_ReverseBytesUS; >> >> A switch might look nicer here, and be easier to extend later on ;) > > This list is a little scary... how do we know that we have all cases in it, and we are not getting regressions because we are missing some? A switch is a good idea, it'll definitely make the code easier to read. I've made the change in the recent commit. As for the cases, I ended up running the compiler unit tests and modifying the list until there were no test failures. Since we check for nodes that do not need truncation handling, any other nodes will automatically default to require truncation handling and fall back to not vectorizing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125610158 From bulasevich at openjdk.org Wed Jun 4 04:40:08 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 4 Jun 2025 04:40:08 GMT Subject: RFR: 8358183: [JVMCI] crash accessing nmethod::jvmci_name in CodeCache::aggregate Message-ID: Zero out _mutable_data_size, _relocation_size and _metadata_size in purge() so that after purge jvmci_data_size() returns 0 and print_heapinfo() won?t touch an invalid _metadata. ------------- Commit messages: - 8358183: [JVMCI] crash accessing nmethod::jvmci_name in CodeCache::aggregate Changes: https://git.openjdk.org/jdk/pull/25608/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25608&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358183 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25608/head:pull/25608 PR: https://git.openjdk.org/jdk/pull/25608 From epeter at openjdk.org Wed Jun 4 05:46:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 05:46:22 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 04:28:38 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Reformat, add comments and char tests @jaskarth Thanks for the updates! I'll run some testing now :) src/hotspot/share/opto/superword.cpp line 2519: > 2517: > 2518: // Default to disallowing vector truncation > 2519: return false; I was wondering: We could have an assert here that lists all operations that cannot be truncated? So if a new operation is added, then we will catch that it is not handled here yet, and we can add tests, and either allow it to truncate, or add it to the list of non-truncatable operations. src/hotspot/share/opto/superword.cpp line 2579: > 2577: Node* load = in->in(1); > 2578: // For certain operations such as shifts and abs(), use the size of the load if it exists > 2579: if ((VectorNode::is_shift_opcode(op) || op == Op_AbsI) && load->is_Load() && Can you say a little more about this? What about `Op_ReverseBytesI`, did that not previously also get through here? ------------- PR Review: https://git.openjdk.org/jdk/pull/25440#pullrequestreview-2895284484 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125689822 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125694632 From epeter at openjdk.org Wed Jun 4 05:46:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 05:46:23 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 05:37:09 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Reformat, add comments and char tests > > src/hotspot/share/opto/superword.cpp line 2519: > >> 2517: >> 2518: // Default to disallowing vector truncation >> 2519: return false; > > I was wondering: > We could have an assert here that lists all operations that cannot be truncated? > So if a new operation is added, then we will catch that it is not handled here yet, and we can add tests, and either allow it to truncate, or add it to the list of non-truncatable operations. > Earlier in the function, this logic is guarded with vtn->basic_type() == T_INT so I think only integer nodes need to be added to the list. Let me know what you think! That sounds reasonable. And that would mean we only have to add `int` operation to that assert. And if anybody ever relaxes that `vtn->basic_type() == T_INT` check, then they would immediately run into that assert. Would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2125696763 From shade at openjdk.org Wed Jun 4 06:05:26 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 06:05:26 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:38:09 GMT, Aleksey Shipilev wrote: >> Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. >> >> Also tidied up some comments around it. >> >> Additional testing; >> - [x] Linux x86_64 server fastdebug, `tier1 tier2` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Stale comment > - Merge branch 'master' into JDK-8357434-x86-profile-taken > - Fix Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25343#issuecomment-2938682685 From shade at openjdk.org Wed Jun 4 06:05:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 06:05:27 GMT Subject: Integrated: 8357434: x86: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Wed, 21 May 2025 08:23:14 GMT, Aleksey Shipilev wrote: > Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. > > Also tidied up some comments around it. > > Additional testing; > - [x] Linux x86_64 server fastdebug, `tier1 tier2` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: b918dc84 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b918dc84ec8364321a5a6d9f6835edcb1d9ad62f Stats: 17 lines in 3 files changed: 0 ins; 12 del; 5 mod 8357434: x86: Simplify Interpreter::profile_taken_branch Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/25343 From rehn at openjdk.org Wed Jun 4 06:07:26 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 4 Jun 2025 06:07:26 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: References: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> Message-ID: On Mon, 5 May 2025 18:10:02 GMT, Yuri Gaevsky wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> change slli+add sequence to shadd > > As you can expect I am trying to implement the following code with RVV: > > for (; i + (N-1) < cnt; i += N) { > h = 31^^N * h > + 31^^(N-1) * val[i + 0] > + 31^^(N-2) * val[i + 1] > ... > + 31^^1 * val[i + (N-2)] > + 31^^0 * val[i + (N-1)]; > } > for (; i < cnt; i++) { > h = 31 * h + val[i]; > } > > where `N` is a number of processing array elements in "chunk". > IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`. > > h = 31^^M * h > + 31^^(M-1) * val[i + 0] > + 31^^(M-2) * val[i + 1] > ... > + 31^^1 * val[i + (M-2)] > + 32^^0 * val[i + (M-1)]; > > or returning to our `N` for clarity > > h = 31^^(N-1) * h > + 31^^(N-2) * val[i + 0] > + 31^^(N-3) * val[i + 1] > ... > + 31^^1 * val[i + (N-3)] > + 31^^0 * val[i + (N-2)]; > > Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop... @ygaevsky @RealFYang how can we procced ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2938689119 From epeter at openjdk.org Wed Jun 4 06:09:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:09:21 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Sun, 1 Jun 2025 17:26:07 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Extending tests and review resolutions @jatin-bhateja Thanks for the updates! I have a first batch of comments about the test :) https://github.com/openjdk/jdk/pull/24179#discussion_r2111355331 Here I asked for this: > And: your pattern matching allows the constant to be lhs or rhs, so you should add corresponding tests. You commented "Done." underneath. Where did you add these tests exactly? Because I only see these patterns: res += Float.floatToFloat16(RANDOM1_VAR.floatValue() + RANDOM2.floatValue()); and not these res += Float.floatToFloat16(RANDOM2.floatValue() + RANDOM1_VAR.floatValue()); (except for maybe a single case where one was flipped, see question below) test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 63: > 61: > 62: private static Generator genF = G.uniformFloats(0.0f, 70000.0f); > 63: private static Generator genHF = G.uniformFloat16s(Float.floatToFloat16(-2000.0f), Float.floatToFloat16(2000.0f)); Is there a good reason to only take the uniform distribution? https://github.com/openjdk/jdk/blob/4a491bef6636441f14fc8bbdedf65063fce038bd/test/hotspot/jtreg/compiler/lib/generators/Generators.java#L102-L105 test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 335: > 333: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() - INEXACT_FP16); > 334: res += Float.floatToFloat16(INEXACT_FP16 * POSITIVE_ZERO_VAR.floatValue()); > 335: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / INEXACT_FP16); Why is the mul case flipped here? test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 363: > 361: @Check(test="testSNaNFP16ConstantPatterns") > 362: public void checkSNaNFP16ConstantPatterns(short actual) throws Exception { > 363: TestFramework.deoptimize(TestFloat16ScalarOperations.class.getMethod("testSNaNFP16ConstantPatterns")); Oh wow, I have never seen this pattern used. Cool idea! Do you know what impact this has on test runtime? test/hotspot/jtreg/compiler/lib/generators/Generators.java line 622: > 620: > 621: /** > 622: * Fill the array with shorts using the distribution of nextDouble. Suggestion: * Fill the array with shorts using the distribution of the generator. There are actually a few other cases that seem to be wrong in this file. Would you mind changing them? ------------- PR Review: https://git.openjdk.org/jdk/pull/24179#pullrequestreview-2895308949 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125709341 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125713317 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125718345 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125706470 From epeter at openjdk.org Wed Jun 4 06:09:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:09:21 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Wed, 4 Jun 2025 05:54:36 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending tests and review resolutions > > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 63: > >> 61: >> 62: private static Generator genF = G.uniformFloats(0.0f, 70000.0f); >> 63: private static Generator genHF = G.uniformFloat16s(Float.floatToFloat16(-2000.0f), Float.floatToFloat16(2000.0f)); > > Is there a good reason to only take the uniform distribution? > > https://github.com/openjdk/jdk/blob/4a491bef6636441f14fc8bbdedf65063fce038bd/test/hotspot/jtreg/compiler/lib/generators/Generators.java#L102-L105 What about `NaN` and `infty` etc? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125711380 From epeter at openjdk.org Wed Jun 4 06:33:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:33:23 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Sun, 1 Jun 2025 17:26:07 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Extending tests and review resolutions And some comments about the VM code :) Looks like we are making good progress here, thanks again for all the work you put in! src/hotspot/share/opto/convertnode.cpp line 281: > 279: conF = binopF->in(2); > 280: varS = binopF->in(1)->in(1); > 281: } Suggestion: if (Float16NodeFactory::is_float32_binary_oper(in(1)->Opcode())) { Node* binopF = in(1); // Check if the incoming binary operation has one floating point constant // input and the other input is a half precision to single precision upcasting node. // We land here because a prior HalfFloat to Float conversion promotes // an integral constant holding Float16 value to a floating point constant. // i.e. ConvHF2F ConI(short) => ConF Node* conF = nullptr; Node* varS = nullptr; if (binopF->in(1)->is_Con() && binopF->in(2)->Opcode() == Op_ConvHF2F) { conF = binopF->in(1); varS = binopF->in(2)->in(1); } else if (binopF->in(2)->is_Con() && binopF->in(1)->Opcode() == Op_ConvHF2F) { conF = binopF->in(2); varS = binopF->in(1)->in(1); } I think it is better to have the variables just before they are assigned. They are not needed in the scope outside the if at the top here anyway. src/hotspot/share/opto/convertnode.cpp line 294: > 292: // Conditions under which floating point constant can be considered for a pattern match. > 293: // 1. Constant must lie within Float16 value range, this will ensure that > 294: // we don't unintentially round off float constant to enforce a pattern match. What do you mean by `enforce a pattern match`? Are you just trying to say that we have to be careful with the pattern matching here, and we cannot just round off the float constant? Do you have an example where that rounding would lead to issues? src/hotspot/share/opto/convertnode.cpp line 302: > 300: // results into a quiet NaN but preserves the significand bits of signaling NaN. > 301: // c. Pattern being matched includes a Float to Float16 conversion after binary > 302: // expression, this downcast will still preserve significand bits of binary32 NaN. Suggestion: // 2. If a constant value is one of the valid IEEE 754 binary32 NaN bit patterns // then it's safe to consider it for pattern match because of the following reasons: // a. As per section 2.8 of JVMS, Java Virtual Machine does not support // signaling NaN value. // b. Any signaling NaN which takes part in a non-comparison expression // results in a quiet NaN but preserves the significand bits of signaling NaN. // c. The pattern being matched includes a Float to Float16 conversion after binary // expression, this downcast will still preserve the significand bits of binary32 NaN. src/hotspot/share/opto/convertnode.cpp line 304: > 302: // expression, this downcast will still preserve significand bits of binary32 NaN. > 303: bool isnan = ((*reinterpret_cast(&con) & 0x7F800000) == 0x7F800000) && > 304: ((*reinterpret_cast(&con) & 0x7FFFFF) != 0); Why are you hand-crafting this check here? Is there not some predefined function to do this check? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24179#pullrequestreview-2895350075 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125731408 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125737423 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125741224 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125743503 From epeter at openjdk.org Wed Jun 4 06:33:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:33:26 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Wed, 4 Jun 2025 06:12:41 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending tests and review resolutions > > src/hotspot/share/opto/convertnode.cpp line 281: > >> 279: conF = binopF->in(2); >> 280: varS = binopF->in(1)->in(1); >> 281: } > > Suggestion: > > if (Float16NodeFactory::is_float32_binary_oper(in(1)->Opcode())) { > Node* binopF = in(1); > // Check if the incoming binary operation has one floating point constant > // input and the other input is a half precision to single precision upcasting node. > // We land here because a prior HalfFloat to Float conversion promotes > // an integral constant holding Float16 value to a floating point constant. > // i.e. ConvHF2F ConI(short) => ConF > Node* conF = nullptr; > Node* varS = nullptr; > if (binopF->in(1)->is_Con() && binopF->in(2)->Opcode() == Op_ConvHF2F) { > conF = binopF->in(1); > varS = binopF->in(2)->in(1); > } else if (binopF->in(2)->is_Con() && binopF->in(1)->Opcode() == Op_ConvHF2F) { > conF = binopF->in(2); > varS = binopF->in(1)->in(1); > } > > I think it is better to have the variables just before they are assigned. They are not needed in the scope outside the if at the top here anyway. You make it sound like this is the only way we get here: // We land here because a prior HalfFloat to Float conversion promotes // an integral constant holding Float16 value to a floating point constant. // i.e. ConvHF2F ConI(short) => ConF Could this pattern not be created directly with Java code? So maybe rephrase it to "For example, e.g."? > src/hotspot/share/opto/convertnode.cpp line 304: > >> 302: // expression, this downcast will still preserve significand bits of binary32 NaN. >> 303: bool isnan = ((*reinterpret_cast(&con) & 0x7F800000) == 0x7F800000) && >> 304: ((*reinterpret_cast(&con) & 0x7FFFFF) != 0); > > Why are you hand-crafting this check here? Is there not some predefined function to do this check? Does `g_isnan` not work here? If not, add a comment why :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125733195 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2125753763 From epeter at openjdk.org Wed Jun 4 06:38:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:38:22 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> Message-ID: On Tue, 3 Jun 2025 17:28:07 GMT, Jatin Bhateja wrote: >> Thanks, encoding logic is concentrated in integral instruction tests and is shared with corresponding long variants, extended APX coverage for BLS/R/MSK. > >> @jatin-bhateja Thanks for looking into this! >> >> `predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1);` >> >> The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? >> >> Are these not cases like a ^ -1, which basically flips all bits. What alternative does this end up using now? >> >> A code comment would be helpful. > > We are tightening the predicate check so that under no circumstances we pick this pattern during the reduction phase of instruction selection on account of having lower cost. There is a generic pattern (xorI_rReg_imm) for all integral immediate values, and then there is a special pattern for Xor with -1 (fxorI_rReg_im1), which is needed for AndN inferencing. @jatin-bhateja I'll wait with testing, until someone from Intel gives this the approval. Feel free to ping me for that once we are there :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2938775593 From epeter at openjdk.org Wed Jun 4 06:38:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:38:25 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:07:34 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions src/hotspot/cpu/x86/x86_64.ad line 11326: > 11324: instruct xorL_rReg_imm(rRegL dst, immL32 src, rFlagsReg cr) > 11325: %{ > 11326: predicate(!UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); Could you add a similar comment here, like for all the others, please :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2125760840 From rrich at openjdk.org Wed Jun 4 06:44:25 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 4 Jun 2025 06:44:25 GMT Subject: RFR: 8354636: [PPC64] Clean up comments regarding frame manager In-Reply-To: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> References: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> Message-ID: On Tue, 3 Jun 2025 14:29:49 GMT, Martin Doerr wrote: > Trivial comment cleanup: Replace "frame manager" by "template interpreter". Looks good. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25616#pullrequestreview-2895426967 From epeter at openjdk.org Wed Jun 4 06:49:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 06:49:25 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> Message-ID: <6F6gsbyRSrJ7_XHXoMh8j15Mog2DMec5DaOCVAdcdFQ=.0799c52f-275c-4303-8bd8-05f341c20ae0@github.com> On Tue, 3 Jun 2025 19:48:37 GMT, Cesar Soares Lucas wrote: > `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. > > Tested with JTREG tier1-3 and Renaissance on Linux x64. @JohnTortugo If you integrate before 24h, you need to explicitly say that it is trivial, and the reviewer needs to agree. We in europe were sleeping and did not even have a chance to look at it. src/hotspot/share/opto/convertnode.cpp line 86: > 84: return nullptr; > 85: } > 86: @JohnTortugo Would it not have been better to put this check inside the `else` branch? ------------- PR Review: https://git.openjdk.org/jdk/pull/25627#pullrequestreview-2895442512 PR Review Comment: https://git.openjdk.org/jdk/pull/25627#discussion_r2125783037 From thartmann at openjdk.org Wed Jun 4 06:58:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 4 Jun 2025 06:58:27 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <6F6gsbyRSrJ7_XHXoMh8j15Mog2DMec5DaOCVAdcdFQ=.0799c52f-275c-4303-8bd8-05f341c20ae0@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> <6F6gsbyRSrJ7_XHXoMh8j15Mog2DMec5DaOCVAdcdFQ=.0799c52f-275c-4303-8bd8-05f341c20ae0@github.com> Message-ID: On Wed, 4 Jun 2025 06:45:50 GMT, Emanuel Peter wrote: >> `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. >> >> Tested with JTREG tier1-3 and Renaissance on Linux x64. > > src/hotspot/share/opto/convertnode.cpp line 86: > >> 84: return nullptr; >> 85: } >> 86: > > @JohnTortugo Would it not have been better to put this check inside the `else` branch? I agree, the `return nullptr;` should have been added below the assert in the else branch, no check required. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25627#discussion_r2125814466 From thartmann at openjdk.org Wed Jun 4 06:58:24 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 4 Jun 2025 06:58:24 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> Message-ID: On Tue, 3 Jun 2025 19:48:37 GMT, Cesar Soares Lucas wrote: > `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. > > Tested with JTREG tier1-3 and Renaissance on Linux x64. Also, please don't file JBS issues (without a subcomponent) and integrate them directly. The component triaging teams should at least get a chance to properly triage the issue and set priority etc. Especially when getting close to the rampdown phases, this is required to determine if an issue is even eligible, potentially only with approval, to be fixed in the current release or needs to be deferred. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25627#issuecomment-2938828992 From thartmann at openjdk.org Wed Jun 4 07:03:20 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 4 Jun 2025 07:03:20 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> Message-ID: On Tue, 3 Jun 2025 19:48:37 GMT, Cesar Soares Lucas wrote: > `Conv2BNode::ideal` segfaults in release builds when the type of `in(1)` is not INT or PTR. Creating a small test case to reproduce the issue is being a bit challenging so this PR only address the issue by bailing out of the method if the input type is unsupported. This other ticket https://bugs.openjdk.org/browse/JDK-8357885 will address creating a regression test for the problem. > > Tested with JTREG tier1-3 and Renaissance on Linux x64. I think this is ok for now, assuming that [JDK-8357885](https://bugs.openjdk.org/browse/JDK-8357885) will clean this up with a full fix and a regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25627#issuecomment-2938849202 From jbhateja at openjdk.org Wed Jun 4 07:10:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 07:10:16 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> Message-ID: On Tue, 3 Jun 2025 17:28:07 GMT, Jatin Bhateja wrote: >> Thanks, encoding logic is concentrated in integral instruction tests and is shared with corresponding long variants, extended APX coverage for BLS/R/MSK. > >> @jatin-bhateja Thanks for looking into this! >> >> `predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1);` >> >> The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? >> >> Are these not cases like a ^ -1, which basically flips all bits. What alternative does this end up using now? >> >> A code comment would be helpful. > > We are tightening the predicate check so that under no circumstances we pick this pattern during the reduction phase of instruction selection on account of having lower cost. There is a generic pattern (xorI_rReg_imm) for all integral immediate values, and then there is a special pattern for Xor with -1 (fxorI_rReg_im1), which is needed for AndN inferencing. > @jatin-bhateja I'll wait with testing, until someone from Intel gives this the approval. Feel free to ping me for that once we are there :) Hi @eme64 , I am process of updating this version with some more changes, please hold on your test runs for a while :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2938870073 From epeter at openjdk.org Wed Jun 4 07:28:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 07:28:16 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:59:23 GMT, Roland Westrelin wrote: > In the example above, the CastPPs are the bases. Aaaah, ok now it makes a little more sense to me :) > > Maybe some more full IR snippets could be helpful, maybe even IGV drawings. But that may be more work for you. > > I rarely use the IGV so, yeah, that would be more work. Then what about just the dump of the relevant IR nodes in text form? That is what I meant by `full IR snippets` ;) Is there any (reasonable) way to push the `CastPP` through the `AddP` here? I guess that may mean duplicating some `AddP` in some cases... But it could also give an opportunity for the `CastPP` to common further up that way. What do you think? It is hard for me to see through it without looking at some examples of the IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-2938917626 From roland at openjdk.org Wed Jun 4 07:32:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Jun 2025 07:32:20 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> Message-ID: On Tue, 27 May 2025 08:17:38 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/escape.cpp line 4804: > >> 4802: assert(n->is_Initialize(), "We only push projections of Initialize"); >> 4803: if (use->as_Proj()->_con == TypeFunc::Memory) { // Ignore precedent edge >> 4804: memnode_worklist.append_if_missing(use); > > Do you know why we are using a `GrowableArray` here? Would a `UnikeNodeList` not serve us better since we are always doing `append_if_missing`, which essentially has to scan the whole `GrowableArray`? It's not clear to me. I filed: https://bugs.openjdk.org/browse/JDK-8358560 as a follow up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2125886722 From epeter at openjdk.org Wed Jun 4 07:39:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 07:39:19 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations In-Reply-To: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: On Tue, 3 Jun 2025 07:17:47 GMT, Daniel Skantz wrote: > This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. > > Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. > > Testing: T1-4. > > Extra testing: ran the tests manually with `-XX:+PrintOptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. @danielogh Thanks for looking into this and finding more tests! Looks reasonable to me. I'm not super familiar with string optimizations, so it would be good if a second reviewer knew a little more. But it looks at least like a good step in the right direction from what I can see :) test/hotspot/jtreg/compiler/c2/Test7046096.java line 36: > 34: /* > 35: * @test id=stringConcatInline > 36: * @bug 7046096 Suggestion: * @bug 7046096 8357822 I'd at the new number here. But probably optional. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25610#pullrequestreview-2895609510 PR Review Comment: https://git.openjdk.org/jdk/pull/25610#discussion_r2125886046 From mhaessig at openjdk.org Wed Jun 4 07:47:37 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 07:47:37 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v80] In-Reply-To: References: Message-ID: <18z8xy6zbC5dWMAzveQOankso6vWI2yj4b4EpsCS3lg=.f176a82c-4660-4500-8369-8976088d3758@github.com> On Tue, 3 Jun 2025 15:57:32 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespaces from applied suggestion I had a look at the changes since my last review. They look excellent. Especially good to see the tutorial improving even further. ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2895650620 From roland at openjdk.org Wed Jun 4 07:48:22 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Jun 2025 07:48:22 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> Message-ID: On Tue, 27 May 2025 08:50:51 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/multnode.cpp line 48: > >> 46: ProjNode* MultiNode::proj_out_or_null(uint which_proj) const { >> 47: assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || which_proj == (uint)true || which_proj == (uint)false, "must be 1 or 0"); >> 48: assert(number_of_projs(which_proj) <= 1, "only when there's a single projection"); > > Does this hold for all `MultiNode`s under all circumstances? Or should we consider returning `nullptr` in this case? So you're suggesting that this could return `nullptr` so the caller could then test for `nullptr` and have some fallback logic? I would stick with the assert: if C2 crashes because of this, I think will be easier to diagnose an assert than an unexpected `nullptr` return. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2125914278 From kwei at openjdk.org Wed Jun 4 07:53:20 2025 From: kwei at openjdk.org (Kuai Wei) Date: Wed, 4 Jun 2025 07:53:20 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Thu, 22 May 2025 07:03:13 GMT, Emanuel Peter wrote: >> @eme64 @wenshao I have a little change to this PR. I will send it soon. Thanks for your patience. > > @kuaiwei I'm not in a rush with this one. I'd rather we have a good design and be reasonably sure that it is correct, rather than rush it now and having to do extra cycles fixing things later ;) Hi @eme64 , I tried to use match pattern for `MergePrimitiveLoads::has_no_merge_load_combine_below()` . But I think it has some difficulty. For mergeable operators, they can be linked in different way, like: 1) (((item1 Or item2) Or item3) Or item4) 2) ((item1 Or item2) Or (item3 Or item4)) ... To check the next `Or` operator is a valid last one of combine operator chain. We may check its all input recursively. I didn't find a good way to revolve it. If you have better idea, I will check it. I think it's more easy to mark the combine operator checked. It works in this way: * If the checking combine operator has successor combine operator , which is not checked before, we do not optimize it and let the next one has chance to be optimized. * If we try to merge but failed, so we mark it as a `checked` and add its input into GVN worklist. So its input operators can be checked. I added comments of MergePrimitiveLoads::has_no_merge_load_combine_below() to describe the design. To reduce the memory size of `AddNode`. I removed the flag from `AddNode` and add 2 virtual fucntions ```c++ // Check if this node is checked by merge_memops phase virtual bool is_merge_memops_checked() const { return false; }; virtual void set_merge_memops_checked(bool v) { ShouldNotReachHere(); }; The flag , `_merge_memops_checked`, is only added in OrINode and OrLNode. Could you help to check the design and code? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2938986831 From xgong at openjdk.org Wed Jun 4 08:00:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 4 Jun 2025 08:00:07 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v2] In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Address review comments on jtreg and jmh tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25539/files - new: https://git.openjdk.org/jdk/pull/25539/files/796e96f7..afe6b2df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=00-01 Stats: 376 lines in 3 files changed: 194 ins; 179 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25539/head:pull/25539 PR: https://git.openjdk.org/jdk/pull/25539 From epeter at openjdk.org Wed Jun 4 08:01:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:01:25 GMT Subject: RFR: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:53:23 GMT, Jatin Bhateja wrote: > Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. > Test mentioned in the bug report has been included allong with the patch. > > Kindly review. > > Best Regards, > Jatin @jatin-bhateja Thanks for looking into this. I just sanity checked the implementation of `compress` ... and the code there is exactly the same! I just took the reproducer and replaced `expand` with `compress` ... and got another assert. public class Test { public static long[] array_0 = fill(new long[10000]); public static long[] array_2 = fill(new long[10000]); public static long[] fill(long[] a) { for (int i = 0; i < a.length; i++) { a[i] = 1; } return a; } public static long one = 1L; static final long[] GOLD = test(); public static long[] test() { long[] out = new long[10000]; for (int i = 0; i < out.length; i++) { long y = array_0[i] % one; long x = (array_2[i] | 4294967298L) << -7640610671680100954L; out[i] = Long.compress(y, x); } return out; } public static void main(String[] args) { for (int i = 0; i < 10_000; i++) { test(); } long[] res = test(); for (int i = 0; i < 10_000; i++) { if (res[i] != GOLD[i]) { throw new RuntimeException("value mismatch: " + res[i] + " vs " + GOLD[i]); } } } } With: `java -Xbatch -XX:CompileCommand=compileonly,Test::test* -XX:+StressIGVN -XX:RepeatCompilation=100 Test.java` Can you please also fix that here, and add regression tests for `Integer/Long.compress`? You should also update the PR title accordingly. src/hotspot/share/opto/intrinsicnode.cpp line 196: > 194: Node* mask = in(2); > 195: if (bottom_type()->isa_int()) { > 196: if (mask->Opcode() == Op_LShiftI && phase->type(mask->in(1))->isa_int() && phase->type(mask->in(1))->is_int()->is_con()) { Why not just check for `top` at the beginning of the function? Just like here: 311 const Type* CompressBitsNode::Value(PhaseGVN* phase) const { 312 const Type* t1 = phase->type(in(1)); 313 const Type* t2 = phase->type(in(2)); 314 if (t1 == Type::TOP || t2 == Type::TOP) { 315 return Type::TOP; 316 } Of course in `Ideal` you would have to return `nullptr` instead, and wait for `Value` to clean it up. That has the benefit that you only need to check it in one place, and then any new optimization we might add in the future does not also have to deal with `top`. test/hotspot/jtreg/compiler/intrinsics/Test8351645.java line 1: > 1: /* I would put the test under `test/hotspot/jtreg/compiler/c2/gvn/TestExpandTopInput.java` Because this is not per se about an intrinsic, more about the `gvn` optimization failing. test/hotspot/jtreg/compiler/intrinsics/Test8351645.java line 28: > 26: * @bug 8351645 > 27: * @summary C2: ExpandBitsNode::Ideal hits assert because of TOP input > 28: * @run main/othervm -Xbatch -Xmx128m compiler.intrinsics.Test8351645 What is the reason for the flags here? Do you really need them? I guess `-Xbatch` could make sense, just to make sure the method is compiled. And you did not need `-XX:CompileCommand=compileonly,Test::test*` to reproduce this, right? Just want to be sure that inlining is not somehow creating issues here. test/hotspot/jtreg/compiler/intrinsics/Test8351645.java line 53: > 51: long y = array_0[i] % one; > 52: long x = (array_2[i] | 4294967298L) << -7640610671680100954L; > 53: out[i] = Long.expand(y, x); Can you please also add a test for `Integer.expand`? Because it seems that your fix addresses not just the `long` but also the `int` case, right? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25586#pullrequestreview-2895648186 PR Review Comment: https://git.openjdk.org/jdk/pull/25586#discussion_r2125911301 PR Review Comment: https://git.openjdk.org/jdk/pull/25586#discussion_r2125935617 PR Review Comment: https://git.openjdk.org/jdk/pull/25586#discussion_r2125916907 PR Review Comment: https://git.openjdk.org/jdk/pull/25586#discussion_r2125912495 From xgong at openjdk.org Wed Jun 4 08:06:17 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 4 Jun 2025 08:06:17 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: On Tue, 3 Jun 2025 07:17:32 GMT, Emanuel Peter wrote: >>> @XiaohongGong I suggest you change the title from: `8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times` to `8357726: C2 recognize loops with multiple casts in trip counter` or even: `8357726: C2 recognize loops with multiple casts in trip counter: phi -> CastII* -> AddI -> phi` >> >> Thanks for your suggestion! Sounds better to me. How about changing the title to `Improve C2 to recognize counted loops with multiple casts in trip counter` ? > >> Thanks for your suggestion! Sounds better to me. How about changing the title to Improve C2 to recognize counted loops with multiple casts in trip counter ? > > @XiaohongGong Sounds good too :) Hi @eme64 , I'v updated the IR test and JMH based on your comments. Could you please help review whether it's fine to you. Thanks for all your suggestion! Following shows the performance data of the new JMH test on Grace (the performance gain is almost the same on my x64 machine): Benchmark Mode Cnt limit Unit Before Error (99.9%) After Error (99.9%) Gain CountedLoopCastIV.loop_iv_int thrpt 30 1024 ops/s 1225620.536 39505.158362 5778120.132 4781.602088 4.71 CountedLoopCastIV.loop_iv_int thrpt 30 1536 ops/s 830600.832 14758.561182 3839404.338 3362.727083 4.62 CountedLoopCastIV.loop_iv_int thrpt 30 2048 ops/s 618114.174 36999.511727 2890853.495 416.969862 4.67 CountedLoopCastIV.loop_iv_long thrpt 30 1024 ops/s 1063902.078 4616.608855 1314828.963 1267.470199 1.23 CountedLoopCastIV.loop_iv_long thrpt 30 1536 ops/s 714538.178 630.085477 870801.472 753.347684 1.21 CountedLoopCastIV.loop_iv_long thrpt 30 2048 ops/s 536724.086 131.313178 652775.363 539.107806 1.21 The error term is larger as before. But I don't think this is caused by the large variance of loop iterations. Does the new benchmark look fine to you? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2939030428 From epeter at openjdk.org Wed Jun 4 08:06:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:06:20 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:39:27 GMT, Zdenek Zambersky wrote: >> This change adds ` -XX:-IgnoreUnrecognizedVMOptions` to problematic tests (or `@requires vm.compiler2.enabled` in one case), to prevent failures `Unrecognized VM option` on client VM. > > Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix of compiler tests for client VM @zzambers Thanks for doing this work! The tests pass :green_circle: ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24262#pullrequestreview-2895714887 From epeter at openjdk.org Wed Jun 4 08:07:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:07:39 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v80] In-Reply-To: <18z8xy6zbC5dWMAzveQOankso6vWI2yj4b4EpsCS3lg=.f176a82c-4660-4500-8369-8976088d3758@github.com> References: <18z8xy6zbC5dWMAzveQOankso6vWI2yj4b4EpsCS3lg=.f176a82c-4660-4500-8369-8976088d3758@github.com> Message-ID: On Wed, 4 Jun 2025 07:44:35 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespaces from applied suggestion > > I had a look at the changes since my last review. They look excellent. > Especially good to see the tutorial improving even further. @mhaessig Thank you very much for having another look! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2939033431 From duke at openjdk.org Wed Jun 4 08:07:40 2025 From: duke at openjdk.org (Tom Shull) Date: Wed, 4 Jun 2025 08:07:40 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v5] In-Reply-To: References: Message-ID: > This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolve()` > 2. `JavaConstant lookup()` > > The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8357660 - Merge remote-tracking branch 'origin/master' into JDK-8357660 - commit to trigger testing - commit to trigger testing - reviewer feedback and update javadoc formatting - complete changes - commit review suggestion Co-authored-by: Douglas Simon - commit review suggestion Co-authored-by: Douglas Simon - change to allow both indys and condys to be looked up all at once - address reviewer feedback - ... and 2 more: https://git.openjdk.org/jdk/compare/826fea84...c7f5c1a7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25420/files - new: https://git.openjdk.org/jdk/pull/25420/files/e0707fb8..c7f5c1a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=03-04 Stats: 3303 lines in 64 files changed: 2485 ins; 442 del; 376 mod Patch: https://git.openjdk.org/jdk/pull/25420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25420/head:pull/25420 PR: https://git.openjdk.org/jdk/pull/25420 From epeter at openjdk.org Wed Jun 4 08:12:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:12:18 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v2] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 4 Jun 2025 08:00:07 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments on jtreg and jmh tests test/hotspot/jtreg/compiler/loopopts/TestCountedLoopCastIV.java line 190: > 188: } else { > 189: TestFramework.run(); > 190: } I would recommend checking that there is no "unexpected" input here. Suggestion: if (args != null && args.length > 0 && args[0].equals("DisableUnroll")) { TestFramework.runWithFlags("-XX:LoopUnrollLimit=0"); } else { if (args.length != 0) { throw new RuntimeException("Unexpected args"); } TestFramework.run(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25539#discussion_r2125967112 From epeter at openjdk.org Wed Jun 4 08:15:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:15:18 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: On Wed, 4 Jun 2025 08:04:00 GMT, Xiaohong Gong wrote: >>> Thanks for your suggestion! Sounds better to me. How about changing the title to Improve C2 to recognize counted loops with multiple casts in trip counter ? >> >> @XiaohongGong Sounds good too :) > > Hi @eme64 , I'v updated the IR test and JMH based on your comments. Could you please help review whether it's fine to you. Thanks for all your suggestion! > > Following shows the performance data of the new JMH test on Grace (the performance gain is almost the same on my x64 machine): > > Benchmark Mode Cnt limit Unit Before Error (99.9%) After Error (99.9%) Gain > CountedLoopCastIV.loop_iv_int thrpt 30 1024 ops/s 1225620.536 39505.158362 5778120.132 4781.602088 4.71 > CountedLoopCastIV.loop_iv_int thrpt 30 1536 ops/s 830600.832 14758.561182 3839404.338 3362.727083 4.62 > CountedLoopCastIV.loop_iv_int thrpt 30 2048 ops/s 618114.174 36999.511727 2890853.495 416.969862 4.67 > CountedLoopCastIV.loop_iv_long thrpt 30 1024 ops/s 1063902.078 4616.608855 1314828.963 1267.470199 1.23 > CountedLoopCastIV.loop_iv_long thrpt 30 1536 ops/s 714538.178 630.085477 870801.472 753.347684 1.21 > CountedLoopCastIV.loop_iv_long thrpt 30 2048 ops/s 536724.086 131.313178 652775.363 539.107806 1.21 > > > The error term is larger as before. But I don't think this is caused by the large variance of loop iterations. Does the new benchmark look fine to you? Thanks! @XiaohongGong Nice, thanks for the updates! Especially the IR rules and reduction in JMH benchmark variance, excellent :) Please ping me again once you have addressed my comment above, and then I can run some internal testing for you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2939057374 From dfenacci at openjdk.org Wed Jun 4 08:18:19 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 4 Jun 2025 08:18:19 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: <0AIqaIvewAyRN2mTf8rkMpl1m7Wcm6BJRzNQq84C9j4=.71353c09-76a7-41e2-9be2-30eaaa9eff29@github.com> On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Very thorough test! Thanks a lot @shipilev! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24972#pullrequestreview-2895749955 From mdoerr at openjdk.org Wed Jun 4 08:35:20 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Jun 2025 08:35:20 GMT Subject: RFR: 8354636: [PPC64] Clean up comments regarding frame manager In-Reply-To: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> References: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> Message-ID: On Tue, 3 Jun 2025 14:29:49 GMT, Martin Doerr wrote: > Trivial comment cleanup: Replace "frame manager" by "template interpreter". Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25616#issuecomment-2939112946 From mdoerr at openjdk.org Wed Jun 4 08:35:20 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Jun 2025 08:35:20 GMT Subject: Integrated: 8354636: [PPC64] Clean up comments regarding frame manager In-Reply-To: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> References: <28IlBh9k0o4RZMbIstYTCl8c0rfIIqVqyPXeXFyx1Ik=.1d4919d2-2437-4c81-8d30-75128b0a0afb@github.com> Message-ID: On Tue, 3 Jun 2025 14:29:49 GMT, Martin Doerr wrote: > Trivial comment cleanup: Replace "frame manager" by "template interpreter". This pull request has now been integrated. Changeset: ab235000 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/ab235000349bfd268e80a7cb99bf07a229406119 Stats: 13 lines in 3 files changed: 0 ins; 2 del; 11 mod 8354636: [PPC64] Clean up comments regarding frame manager Reviewed-by: amitkumar, rrich ------------- PR: https://git.openjdk.org/jdk/pull/25616 From epeter at openjdk.org Wed Jun 4 08:44:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:44:19 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Tue, 27 May 2025 17:26:59 GMT, Manuel H?ssig wrote: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Drive by comment ;) test/micro/org/openjdk/bench/vm/compiler/x86/RedundantLeaPeephole.java line 33: > 31: @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > 32: @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > 33: @Fork(value = 3, jvmArgsAppend = {"-Xms1g", "-Xmx1g"}) Ha, what did you need these args for? Could be nice to have a little comment in the code. ------------- PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2895825725 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126028316 From chagedorn at openjdk.org Wed Jun 4 09:02:18 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Jun 2025 09:02:18 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v2] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 4 Jun 2025 08:00:07 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments on jtreg and jmh tests Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25539#pullrequestreview-2895901032 From epeter at openjdk.org Wed Jun 4 09:06:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 09:06:17 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: On Wed, 4 Jun 2025 08:04:00 GMT, Xiaohong Gong wrote: >>> Thanks for your suggestion! Sounds better to me. How about changing the title to Improve C2 to recognize counted loops with multiple casts in trip counter ? >> >> @XiaohongGong Sounds good too :) > > Hi @eme64 , I'v updated the IR test and JMH based on your comments. Could you please help review whether it's fine to you. Thanks for all your suggestion! > > Following shows the performance data of the new JMH test on Grace (the performance gain is almost the same on my x64 machine): > > Benchmark Mode Cnt limit Unit Before Error (99.9%) After Error (99.9%) Gain > CountedLoopCastIV.loop_iv_int thrpt 30 1024 ops/s 1225620.536 39505.158362 5778120.132 4781.602088 4.71 > CountedLoopCastIV.loop_iv_int thrpt 30 1536 ops/s 830600.832 14758.561182 3839404.338 3362.727083 4.62 > CountedLoopCastIV.loop_iv_int thrpt 30 2048 ops/s 618114.174 36999.511727 2890853.495 416.969862 4.67 > CountedLoopCastIV.loop_iv_long thrpt 30 1024 ops/s 1063902.078 4616.608855 1314828.963 1267.470199 1.23 > CountedLoopCastIV.loop_iv_long thrpt 30 1536 ops/s 714538.178 630.085477 870801.472 753.347684 1.21 > CountedLoopCastIV.loop_iv_long thrpt 30 2048 ops/s 536724.086 131.313178 652775.363 539.107806 1.21 > > > The error term is larger as before. But I don't think this is caused by the large variance of loop iterations. Does the new benchmark look fine to you? Thanks! @XiaohongGong Let's please delay this until after Thursday, so that this does not go into JDK25 yet, and we have more time to fix it if something goes wrong down the line. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2939218743 From xgong at openjdk.org Wed Jun 4 09:16:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 4 Jun 2025 09:16:53 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v3] In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Address reivew comments on IR test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25539/files - new: https://git.openjdk.org/jdk/pull/25539/files/afe6b2df..08538543 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25539/head:pull/25539 PR: https://git.openjdk.org/jdk/pull/25539 From xgong at openjdk.org Wed Jun 4 09:17:00 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 4 Jun 2025 09:17:00 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: On Wed, 4 Jun 2025 09:03:21 GMT, Emanuel Peter wrote: > @XiaohongGong Let's please delay this until after Thursday, so that this does not go into JDK25 yet, and we have more time to fix it if something goes wrong down the line. Sure. That makes sense to me. Thanks! BTW, I'v updated the test according to your comment. So could you please help run all the tests? Thanks again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2939250321 From mhaessig at openjdk.org Wed Jun 4 09:18:22 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 09:18:22 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: On Tue, 3 Jun 2025 17:44:07 GMT, Vladimir Kozlov wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 270: > >> 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy >> 269: // if that is only used by the decode. >> 270: bool Peephole::lea_remove_redundant(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_, > > Why do you need `_` suffix? I don't really need them. I only matched the signature of the other peephole functions. We could remove the underline for all peepholes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126104773 From mhaessig at openjdk.org Wed Jun 4 09:18:23 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 09:18:23 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 08:39:57 GMT, Emanuel Peter wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > test/micro/org/openjdk/bench/vm/compiler/x86/RedundantLeaPeephole.java line 33: > >> 31: @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) >> 32: @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) >> 33: @Fork(value = 3, jvmArgsAppend = {"-Xms1g", "-Xmx1g"}) > > Ha, what did you need these args for? Could be nice to have a little comment in the code. This is what I gather to be good practice from @shipilev's [blog post about JMS benchmarks](https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_benchmark). It ensures a consistent heap size across machines and runs because the `StoreN` benchmarks are sensitive to different GC's and heap layouts. But these are my first JMH benchmarks, so I appreciate any input. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126107900 From roland at openjdk.org Wed Jun 4 09:21:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Jun 2025 09:21:23 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v10] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 39 additional commits since the last revision: - more - lambda return - lambda clean up - Merge branch 'master' into JDK-8327963 - Update src/hotspot/share/opto/library_call.cpp Co-authored-by: Emanuel Peter - review - new test tweak - new test - Merge branch 'master' into JDK-8327963 - Merge branch 'master' into JDK-8327963 - ... and 29 more: https://git.openjdk.org/jdk/compare/3f54e74e...69c6e50b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/c0a8ad21..69c6e50b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=08-09 Stats: 98759 lines in 1463 files changed: 60579 ins; 25167 del; 13013 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From shade at openjdk.org Wed Jun 4 09:25:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 09:25:17 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 09:15:12 GMT, Manuel H?ssig wrote: >> test/micro/org/openjdk/bench/vm/compiler/x86/RedundantLeaPeephole.java line 33: >> >>> 31: @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) >>> 32: @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) >>> 33: @Fork(value = 3, jvmArgsAppend = {"-Xms1g", "-Xmx1g"}) >> >> Ha, what did you need these args for? Could be nice to have a little comment in the code. > > This is what I gather to be good practice from @shipilev's [blog post about JMS benchmarks](https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_benchmark). It ensures a consistent heap size across machines and runs because the `StoreN` benchmarks are sensitive to different GC's and heap layouts. But these are my first JMH benchmarks, so I appreciate any input. Yes, exactly. We often do this for allocation-heavy benchmarks to put GC in more consistent conditions. GC often tries to decide whether to do a GC cycle or expand the heap, and this decision might change with minor externalities. So it sometimes contributes to run-to-run and intra-run variance. This benchmark does allocate pretty hard, so setting a heap size makes sense. (Additionally, this forces a selection of a particular compressed oops mode, 32-bit in this case, which is also nice for reproducibility.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126122739 From roland at openjdk.org Wed Jun 4 09:26:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Jun 2025 09:26:19 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> Message-ID: On Tue, 27 May 2025 09:12:18 GMT, Emanuel Peter wrote: > I was a little confused about this apply_to_proj construct. I this something we already use, a familiar concept, the apply_to? Thanks for the pointer to the style guide section. I followed the recommendation there. I also followed your suggestion for the return value of the callback. All of your other comments should be addressed in new commit as well. @eme64 please have another look when you find the time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2939274746 From roland at openjdk.org Wed Jun 4 09:26:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Jun 2025 09:26:20 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> Message-ID: On Wed, 28 May 2025 07:34:36 GMT, Roberto Casta?eda Lozano wrote: > In the common case where allocations are not eliminated, matching transforms the introduced `NarrowMemProj` nodes into a sequence of redundant, raw `MemProj` nodes, see e.g. B6 here: [after-gcm.pdf](https://github.com/user-attachments/files/20477560/after-gcm.pdf). Would it be possible to clean them up during matching (or perhaps already during, or right after, macro expansion)? Thanks for looking at this @robcasloz I made the change you requested. > src/hotspot/share/opto/multnode.cpp line 49: > >> 47: assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || which_proj == (uint)true || which_proj == (uint)false, "must be 1 or 0"); >> 48: assert(number_of_projs(which_proj) <= 1, "only when there's a single projection"); >> 49: auto find_proj = [which_proj, this](ProjNode* proj) { > > This does not build on macosx-aarch64: > > > src/hotspot/share/opto/multnode.cpp:49:21: error: lambda capture 'which_proj' is not used [-Werror,-Wunused-lambda-capture] > auto find_proj = [which_proj, this](ProjNode* proj) { Thanks for the report. This should be fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2939277421 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2126123810 From jbhateja at openjdk.org Wed Jun 4 09:43:44 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 09:43:44 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v5] In-Reply-To: References: Message-ID: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding cost to memory patterns ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/e332f191..a67a7d0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=03-04 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From jbhateja at openjdk.org Wed Jun 4 09:51:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 09:51:03 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v6] In-Reply-To: References: Message-ID: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/a67a7d0a..38bf655e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From jbhateja at openjdk.org Wed Jun 4 09:51:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 09:51:03 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> Message-ID: <7fnB8ubV2eSkP88UkrVQ6qmNZcomS5Zby6mAUukJP4Y=.c8048c62-404b-4985-a614-9637f3fd03e9@github.com> On Wed, 4 Jun 2025 06:35:50 GMT, Emanuel Peter wrote: >>> @jatin-bhateja Thanks for looking into this! >>> >>> `predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1);` >>> >>> The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? >>> >>> Are these not cases like a ^ -1, which basically flips all bits. What alternative does this end up using now? >>> >>> A code comment would be helpful. >> >> We are tightening the predicate check so that under no circumstances we pick this pattern during the reduction phase of instruction selection on account of having lower cost. There is a generic pattern (xorI_rReg_imm) for all integral immediate values, and then there is a special pattern for Xor with -1 (fxorI_rReg_im1), which is needed for AndN inferencing. > > @jatin-bhateja I'll wait with testing, until someone from Intel gives this the approval. Feel free to ping me for that once we are there :) Hi @eme64, please initiate your test runs, we can have a second review from @sviswa7 once she is online. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2939356138 From shade at openjdk.org Wed Jun 4 10:31:26 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 10:31:26 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> <6F6gsbyRSrJ7_XHXoMh8j15Mog2DMec5DaOCVAdcdFQ=.0799c52f-275c-4303-8bd8-05f341c20ae0@github.com> Message-ID: On Wed, 4 Jun 2025 06:55:25 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/convertnode.cpp line 86: >> >>> 84: return nullptr; >>> 85: } >>> 86: >> >> @JohnTortugo Would it not have been better to put this check inside the `else` branch? > > I agree, the `return nullptr;` should have been added below the assert in the else branch, no check required. Yes, putting `return nullptr;` into existing branch would have been cleaner. One more reason to wait for reviews! We can do a quick follow-up that sweeps this return to its better place, if you feel strongly about it. Note this would likely get backported, so being extra-clean pays off the process hassle. It is about 15 minute deal for me, I am happy to do it as a penance for not coming up with it myself :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25627#discussion_r2126256450 From mhaessig at openjdk.org Wed Jun 4 10:46:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 10:46:17 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: <17Sqc6IAzWE8E8JFrFCEgpi15ox-L9_IG8cxVuyc_A8=.709fb4de-5d33-4a50-9188-e3d306ebdb23@github.com> On Wed, 4 Jun 2025 09:13:34 GMT, Manuel H?ssig wrote: >> src/hotspot/cpu/x86/peephole_x86_64.cpp line 270: >> >>> 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy >>> 269: // if that is only used by the decode. >>> 270: bool Peephole::lea_remove_redundant(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_, >> >> Why do you need `_` suffix? > > I don't really need them. I only matched the signature of the other peephole functions. > > We could remove the underline for all peepholes. This probably comes from the signature in `MachNode`: https://github.com/openjdk/jdk/blob/7838321b74276e45b92c54904ea31ef70ed9e33f/src/hotspot/share/opto/machnode.hpp#L368-L369 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126287499 From jbhateja at openjdk.org Wed Jun 4 10:52:10 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 10:52:10 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v6] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/generators/Generators.java Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/4a491bef..b95c51cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=04-05 Stats: 14 lines in 2 files changed: 2 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Wed Jun 4 10:52:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 10:52:11 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: <_IdYz769mq7-kTO802umUJX7Bmaz3Ds4GWLb75lAW8I=.0394a525-2288-407e-9201-7fb6b5f92353@github.com> On Wed, 4 Jun 2025 06:17:23 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending tests and review resolutions > > src/hotspot/share/opto/convertnode.cpp line 294: > >> 292: // Conditions under which floating point constant can be considered for a pattern match. >> 293: // 1. Constant must lie within Float16 value range, this will ensure that >> 294: // we don't unintentially round off float constant to enforce a pattern match. > > What do you mean by `enforce a pattern match`? > > Are you just trying to say that we have to be careful with the pattern matching here, and we cannot just round off the float constant? Do you have an example where that rounding would lead to issues? import jdk.incubator.vector.*; public class verify_rounding { public static void check() { for (int i = 0; i < 65550; i++) { short post_rounding = Float.floatToFloat16(Float.float16ToFloat(Float.floatToFloat16((float)i)) * 2049.0f); short pre_rounding = Float16.float16ToRawShortBits(Float16.multiply(Float16.valueOf((float)i), Float16.valueOf((float)2049.0f))); if (pre_rounding != post_rounding) { System.out.println("Mismatch at val = " + (float)i); System.out.println("post_rounding val = " + post_rounding); System.out.println("pre_rounding val = " + pre_rounding); break; } } } public static void main(String [] args) { check(); } } CPROMPT>java --add-modules=jdk.incubator.vector -cp . verify_rounding WARNING: Using incubator modules: jdk.incubator.vector Mismatch at val = 3.0 post_rounding val = 28161 pre_rounding val = 28160 Since we intend to infer Float16 IR using patten match, hence it may be incorrect to transform post_rounting pattern to pre_rounding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2126295150 From shade at openjdk.org Wed Jun 4 11:01:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 11:01:39 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:05:12 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix During the final pre-integration checks, I noticed that this PR still has buffer adjustments for `c2_only` case. That adjustment is, strictly speaking, outside the scope of this improvement. So I reverted that hunk. Tests still pass. I see [JDK-8354727](https://bugs.openjdk.org/browse/JDK-8354727) was filed to figure out what happens when we are scarce on code cache, so I would feel better to put it on @mhaessig to mix https://github.com/openjdk/jdk/pull/24972/commits/c43b18a6681acb541be1b3bdadbd635070a2d58d into his work :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24972#issuecomment-2939570654 From shade at openjdk.org Wed Jun 4 11:01:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 11:01:38 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v4] In-Reply-To: References: Message-ID: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> > There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. > > But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. > > > $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ > -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done > > # Before > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > # After > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > > It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. > > Additional testing: > - [x] New regression test passes with the fix, fails without it > - [x] GHA > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Revert buffer size change - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Better test, patch amendments - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Unnecessary arch limitation - Simplify test - Adjust test bound - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24972/files - new: https://git.openjdk.org/jdk/pull/24972/files/f8519b46..c43b18a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=02-03 Stats: 47592 lines in 764 files changed: 24205 ins; 14405 del; 8982 mod Patch: https://git.openjdk.org/jdk/pull/24972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24972/head:pull/24972 PR: https://git.openjdk.org/jdk/pull/24972 From zzambers at openjdk.org Wed Jun 4 11:03:17 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 4 Jun 2025 11:03:17 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:05:22 GMT, Emanuel Peter wrote: >> @eme64 I have rebased my changes on master and fixed conflicts. (caused by integration of [JDK-8350457](https://github.com/openjdk/jdk/pull/24522)) >> >> I have also updated PR description. >> (I have not changed JIRA as there is no info about fix. Should I add it there?) > >> (I have not changed JIRA as there is no info about fix. Should I add it there?) > > Yes please, that is generally what we should do :) @eme64 thank you for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2939581284 From duke at openjdk.org Wed Jun 4 11:03:18 2025 From: duke at openjdk.org (duke) Date: Wed, 4 Jun 2025 11:03:18 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 18:39:27 GMT, Zdenek Zambersky wrote: >> This change adds ` -XX:-IgnoreUnrecognizedVMOptions` to problematic tests (or `@requires vm.compiler2.enabled` in one case), to prevent failures `Unrecognized VM option` on client VM. > > Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix of compiler tests for client VM @zzambers Your change (at version d6196a9a6c5c9797bb13e4629757aaf6d550a6b1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2939583470 From mhaessig at openjdk.org Wed Jun 4 11:13:35 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 11:13:35 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: - Add comment to benchmark as to why we fix the heap size - Add missing null chec - Fix typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/2d8110b0..67afb3ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=00-01 Stats: 9 lines in 3 files changed: 1 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Wed Jun 4 11:18:21 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 11:18:21 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: <_J9vzkD1D2SWccO3uHIa7sdyhA8AEKg7sM0wBXeTegE=.97c69a7b-4d3d-4574-8130-54294f93e5cb@github.com> On Tue, 3 Jun 2025 17:41:57 GMT, Vladimir Kozlov wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add comment to benchmark as to why we fix the heap size >> - Add missing null chec >> - Fix typos > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 255: > >> 253: // This peephole recognizes graphs of the shape as shown above, ensures that the result of the >> 254: // decode is only used by the derived oop and removes that decode if this is the case. Futher, >> 255: // multipe leaP*s can have the same decode as their base. This peephole will remove the decode > > Typo `multipe` Fixed in [fb728f9](https://github.com/openjdk/jdk/pull/25471/commits/fb728f925442729b111fbe2b2c3cd57e3ed659c0) > src/hotspot/cpu/x86/peephole_x86_64.cpp line 267: > >> 265: // | / \ >> 266: // leaP* MachProj (leaf) >> 267: // In this case where te common parent of the leaP* and the decode is one MemToRegSpill Copy > > Typo: `te` Fixed in [fb728f9](https://github.com/openjdk/jdk/pull/25471/commits/fb728f925442729b111fbe2b2c3cd57e3ed659c0) > src/hotspot/cpu/x86/peephole_x86_64.cpp line 268: > >> 266: // leaP* MachProj (leaf) >> 267: // In this case where te common parent of the leaP* and the decode is one MemToRegSpill Copy >> 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy > > Typo: `als` Fixed in [fb728f9](https://github.com/openjdk/jdk/pull/25471/commits/fb728f925442729b111fbe2b2c3cd57e3ed659c0) > src/hotspot/cpu/x86/peephole_x86_64.cpp line 324: > >> 322: >> 323: // Ensure the MachProj is in the same block as the decode and the lea. >> 324: if (!block->contains(proj)) { > > Should we check `proj == nullptr` ? Indeed, we should. Thank you for pointing it out. I fixed it in [bf75c0d](https://github.com/openjdk/jdk/pull/25471/commits/bf75c0da751def40149b5548fd0c89595318fc11). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126340673 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126341088 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126341424 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126342950 From mhaessig at openjdk.org Wed Jun 4 11:18:23 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 11:18:23 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 09:22:24 GMT, Aleksey Shipilev wrote: >> This is what I gather to be good practice from @shipilev's [blog post about JMS benchmarks](https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_benchmark). It ensures a consistent heap size across machines and runs because the `StoreN` benchmarks are sensitive to different GC's and heap layouts. But these are my first JMH benchmarks, so I appreciate any input. > > Yes, exactly. We often do this for allocation-heavy benchmarks to put GC in more consistent conditions. GC often tries to decide whether to do a GC cycle or expand the heap, and this decision might change with minor externalities. So it sometimes contributes to run-to-run and intra-run variance. This benchmark does allocate pretty hard, so setting a heap size makes sense. > > (Additionally, this forces a selection of a particular compressed oops mode, 32-bit in this case, which is also nice for reproducibility.) Added comment in [67afb3c](https://github.com/openjdk/jdk/pull/25471/commits/67afb3ca570a5d0edc7b157f3403fde9d8829f2f) to explain this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126340154 From mhaessig at openjdk.org Wed Jun 4 11:26:21 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 11:26:21 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v4] In-Reply-To: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> References: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> Message-ID: On Wed, 4 Jun 2025 11:01:38 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Revert buffer size change > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix > I see [JDK-8354727](https://bugs.openjdk.org/browse/JDK-8354727) was filed to figure out what happens when we are scarce on code cache, so I would feel better to put it on @mhaessig to mix https://github.com/openjdk/jdk/commit/c43b18a6681acb541be1b3bdadbd635070a2d58d into his work :) Will do. Thanks for the heads up! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24972#issuecomment-2939648782 From epeter at openjdk.org Wed Jun 4 11:45:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 11:45:18 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: <7A34dFa6iEzgYMzuQgRjl9LP5nbWZOfUe-42_i25wYI=.73856c54-c66e-4a3c-9451-d4f293fa9e61@github.com> On Wed, 4 Jun 2025 10:59:57 GMT, Zdenek Zambersky wrote: >>> (I have not changed JIRA as there is no info about fix. Should I add it there?) >> >> Yes please, that is generally what we should do :) > > @eme64 thank you for the review @zzambers Before we integrate, I'd like to hear if @vnkozlov agrees with the changes too! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2939707225 From epeter at openjdk.org Wed Jun 4 12:27:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 12:27:37 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v81] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/72923879..0ec0949a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=80 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=79-80 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From rcastanedalo at openjdk.org Wed Jun 4 12:28:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Jun 2025 12:28:50 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v80] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 15:57:32 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespaces from applied suggestion Looks good, I just have a few language suggestions. Thanks for driving this Emanuel! test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 248: > 246: * // The count is still not different to "c1". > 247: * let("c3", dataNames(MUTABLE).exactOf(someType).count()), > 248: * // We nest a Template. This creats a TemplateToken, which is later evaluated. Suggestion: * // We nest a Template. This creates a TemplateToken, which is later evaluated. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 552: > 550: * > 551: *

> 552: * Here an example with template arguments {@code 'a'} and {@code 'b'}, captured once as string names Suggestion: * Here is an example with template arguments {@code 'a'} and {@code 'b'}, captured once as string names test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 596: > 594: /** > 595: * Creates a {@link TemplateBody} from a list of tokens, which can be {@link String}s, > 596: * boxed primitive types (e.g. {@link Integer} or auto-boxed {@code int}), any {@link Token}, For correct javadoc description generation in `Template.html`: Suggestion: * boxed primitive types (for example {@link Integer} or auto-boxed {@code int}), any {@link Token}, test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 627: > 625: * > 626: *

> 627: * Here an example where a Template creates a local variable {@code 'var'}, Suggestion: * Here is an example where a Template creates a local variable {@code 'var'}, test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 277: > 275: // Fourth Template use with template2, no use of dollar, so > 276: // no "_4" shows up in the generated code. Internally, it > 277: // calls template1, shich is the fifth Template use, with Suggestion: // calls template1, which is the fifth Template use, with test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 832: > 830: > 831: // Having defined these helper methods, let us start with the first example. > 832: // You should start reading this example bottum-up, starting at Suggestion: // You should start reading this example bottom-up, starting at ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2896428767 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2126413316 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2126436853 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2126427705 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2126435300 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2126407483 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2126410019 From epeter at openjdk.org Wed Jun 4 12:28:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 12:28:58 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:06:34 GMT, Roberto Casta?eda Lozano wrote: >> A few more documentation suggestions, will continue reviewing this changeset over the next days. > >> @robcasloz I addressed all your comments :) > > Thanks @eme64! @robcasloz Thanks for making another pass and for the suggestions! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2939808487 From epeter at openjdk.org Wed Jun 4 12:29:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 12:29:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 12:14:48 GMT, Christian Hagedorn wrote: >> Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! >> >> I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) > >> @chhagedorn Alright, I now have a decent solution for `$$var` and `$1var` etc. I also added tests for it. >> >> These are issues we could continue the conversation, unless you are satisfied with my answers: [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115388737) [#24217 (comment)](https://github.com/openjdk/jdk/pull/24217#discussion_r2115406391) >> >> This is now ready for another review pass ? > > Awesome, thanks for spending some more time with these nasty edge-cases and finding a solution! I had a look at your updates for all my comments, they look good, thanks! > > I'm going to make a pass over the implementation classes now and will have a look at the `Renderer` updates as well :-) @chhagedorn @robcasloz @mhaessig Thanks a lot for all the time you invested to see this through, I know it took a lot of effort to review this, so I am very thankful ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2939819766 From roland at openjdk.org Wed Jun 4 12:37:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Jun 2025 12:37:33 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 07:25:54 GMT, Emanuel Peter wrote: > Then what about just the dump of the relevant IR nodes in text form? That is what I meant by `full IR snippets` ;) Are the omitted inputs to `AddP`s that you'd like to see? Anything else? Do you want to see them added to: /-> CastPP#110 Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 -> AddP#277 -> AddP#278 -> CastPP#283 -> CastPP#283 ? > Is there any (reasonable) way to push the `CastPP` through the `AddP` here? I guess that may mean duplicating some `AddP` in some cases... But it could also give an opportunity for the `CastPP` to common further up that way. What do you think? It is hard for me to see through it without looking at some examples of the IR. That's not where C2 expects the `CastPP`s to be so I suppose it could be quite disruptive but hard for me to tell how much. Beyond that, wouldn't we need to know if one `CastPP` dominates the other `CastPP` before we can common them and would have the same issue we have here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-2939858255 From mhaessig at openjdk.org Wed Jun 4 12:37:42 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 12:37:42 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v4] In-Reply-To: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> References: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> Message-ID: On Wed, 4 Jun 2025 11:01:38 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Revert buffer size change > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 2: > 1: /* > 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. IANAL, but shouldn't this include the year? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2126479501 From mhaessig at openjdk.org Wed Jun 4 12:38:26 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 12:38:26 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v81] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 12:27:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by mhaessig (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2896547360 From epeter at openjdk.org Wed Jun 4 12:38:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 12:38:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v81] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 12:27:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano I want to thank all the contributors here again before integration: @TobiHartmann For the early experiments and getting this project started. @tobiasholenstein For taking on the project, and mentoring Said. They came up with the `$` idea for de-duplication of variable names. @theoweidmannoracle For the fantastic idea using Generics and making it more functional. His prototype is what I then ran with. @chhagedorn We had a lot of enlightening conversations, he was the one who invested the most effort reviewing this patch. He pushed me to improve a lot of API parts, and I learned a lot from his efforts on the Testing Framework, especially that even such frameworks should be tested thoroughly. @robcasloz Also invested a lot of time, and played with it hands on. One of his ideas was to allow the curly brackets for `${name}` and `#{name}`. @mhaessig Even though he only just joined us, he already jumped on board quickly and is already reviewing, just fantastic! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2939859277 From jbhateja at openjdk.org Wed Jun 4 12:39:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 12:39:59 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v7] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/b95c51cb..18fb6dcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=05-06 Stats: 31 lines in 2 files changed: 22 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Wed Jun 4 12:40:00 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 12:40:00 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: <5-8MuNe9k6w-QscevXWn50i-ve5wte7b-QO6Js96ASc=.abc645b4-56cf-49a5-9bd0-afdd43454d99@github.com> On Wed, 4 Jun 2025 06:28:44 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/convertnode.cpp line 304: >> >>> 302: // expression, this downcast will still preserve significand bits of binary32 NaN. >>> 303: bool isnan = ((*reinterpret_cast(&con) & 0x7F800000) == 0x7F800000) && >>> 304: ((*reinterpret_cast(&con) & 0x7FFFFF) != 0); >> >> Why are you hand-crafting this check here? Is there not some predefined function to do this check? > > Does `g_isnan` not work here? If not, add a comment why :) Nice suggestion!, Fixed. >> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 63: >> >>> 61: >>> 62: private static Generator genF = G.uniformFloats(0.0f, 70000.0f); >>> 63: private static Generator genHF = G.uniformFloat16s(Float.floatToFloat16(-2000.0f), Float.floatToFloat16(2000.0f)); >> >> Is there a good reason to only take the uniform distribution? >> >> https://github.com/openjdk/jdk/blob/4a491bef6636441f14fc8bbdedf65063fce038bd/test/hotspot/jtreg/compiler/lib/generators/Generators.java#L102-L105 > > What about `NaN` and `infty` etc? There are some value transforms which are sensitive to specific value range e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/subnode.cpp#L2020 https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L897 Choosing any random value will make it tricky to put hard IR checks in place, uniformFloat range is hitting right sweet spot for us. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2126466898 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2126466086 From jbhateja at openjdk.org Wed Jun 4 12:40:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 12:40:02 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Wed, 4 Jun 2025 05:57:53 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending tests and review resolutions > > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 335: > >> 333: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() - INEXACT_FP16); >> 334: res += Float.floatToFloat16(INEXACT_FP16 * POSITIVE_ZERO_VAR.floatValue()); >> 335: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / INEXACT_FP16); > > Why is the mul case flipped here? To check for constant on either side of an expression. > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 363: > >> 361: @Check(test="testSNaNFP16ConstantPatterns") >> 362: public void checkSNaNFP16ConstantPatterns(short actual) throws Exception { >> 363: TestFramework.deoptimize(TestFloat16ScalarOperations.class.getMethod("testSNaNFP16ConstantPatterns")); > > Oh wow, I have never seen this pattern used. Cool idea! Do you know what impact this has on test runtime? IIUC, since entier framework is based on whitebox APIs hence @Check annotated method is only invoked once after each @Test annotated method execution, I don't see much impact on test execution time here, we are just making sure that the expected value gets computed by the interpreter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2126466180 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2126466229 From chagedorn at openjdk.org Wed Jun 4 12:45:44 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Jun 2025 12:45:44 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v81] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 12:27:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2896583145 From epeter at openjdk.org Wed Jun 4 12:46:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 12:46:27 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Wed, 4 Jun 2025 07:50:24 GMT, Kuai Wei wrote: >> @kuaiwei I'm not in a rush with this one. I'd rather we have a good design and be reasonably sure that it is correct, rather than rush it now and having to do extra cycles fixing things later ;) > > Hi @eme64 , I tried to use match pattern for `MergePrimitiveLoads::has_no_merge_load_combine_below()` . But I think it has some difficulty. For mergeable operators, they can be linked in different way, like: > 1) (((item1 Or item2) Or item3) Or item4) > 2) ((item1 Or item2) Or (item3 Or item4)) > ... > To check the next `Or` operator is a valid last one of combine operator chain. We may check its all input recursively. I didn't find a good way to revolve it. If you have better idea, I will check it. > > I think it's more easy to mark the combine operator checked. It works in this way: > * If the checking combine operator has successor combine operator , which is not checked before, we do not optimize it and let the next one has chance to be optimized. > * If we try to merge but failed, so we mark it as a `checked` and add its input into GVN worklist. So its input operators can be checked. > > I added comments of MergePrimitiveLoads::has_no_merge_load_combine_below() to describe the design. > > To reduce the memory size of `AddNode`. I removed the flag from `AddNode` and add 2 virtual fucntions > ```c++ > // Check if this node is checked by merge_memops phase > virtual bool is_merge_memops_checked() const { return false; }; > virtual void set_merge_memops_checked(bool v) { ShouldNotReachHere(); }; > > The flag , `_merge_memops_checked`, is only added in OrINode and OrLNode. > > Could you help to check the design and code? > > Thanks. @kuaiwei Thanks for your reply! > I think it's more easy to mark the combine operator checked. It may seem easier now. But over time, if multiple operations had such flags, things would become very messy. And now every node that can be such a `combine operator` has to have an additional flag, and consumes more memory. > I tried to use match pattern for MergePrimitiveLoads::has_no_merge_load_combine_below() . But I think it has some difficulty. For mergeable operators, they can be linked in different way, like: > (((item1 Or item2) Or item3) Or item4) > ((item1 Or item2) Or (item3 Or item4)) > ... Yes, we may have to deal with inputs being permuted. But I think we should be able to deal with the permutations, we do that in other places too. > To check the next Or operator is a valid last one of combine operator chain. We may check its all input recursively. I didn't find a good way to revolve it. If you have better idea, I will check it. I'm not sure I understood what you said here. > We may check its all input recursively You probably mean we could check all outputs? So if you are looking at the `OrINode`, and the pattern above it is already a `MergeLoad` pattern, then we should also look down, and see if we find other `OrINode`. For each of these output nodes, we should check if their other input could also be merged with what we already have. Do you not think this is possible? What exactly makes it difficult or impossible? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2939903492 From epeter at openjdk.org Wed Jun 4 13:20:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 13:20:33 GMT Subject: Integrated: 8344942: Template-Based Testing Framework In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:31:36 GMT, Emanuel Peter wrote: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... This pull request has now been integrated. Changeset: 248341d3 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/248341d372ba9c1031729a65eb10d8def52de641 Stats: 6722 lines in 27 files changed: 6722 ins; 0 del; 0 mod 8344942: Template-Based Testing Framework Co-authored-by: Tobias Hartmann Co-authored-by: Tobias Holenstein Co-authored-by: Theo Weidmann Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Christian Hagedorn Co-authored-by: Manuel H?ssig Reviewed-by: chagedorn, mhaessig, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/24217 From mhaessig at openjdk.org Wed Jun 4 13:24:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 4 Jun 2025 13:24:58 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: On Tue, 3 Jun 2025 17:37:41 GMT, Vladimir Kozlov wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add comment to benchmark as to why we fix the heap size >> - Add missing null chec >> - Fix typos > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 244: > >> 242: // the DecodeN. However, after matching the DecodeN is added back as the base for the leaP*, >> 243: // which is nessecary if the oop derived by the leaP* gets added to an OopMap, because OopMaps >> 244: // cannot contain derived oops with narrow oops as a base. > > Am I correct to assume that if it is referenced in OopMap (which is side table) it will by referenced by some Safepoint node in graph? Exactly. This is why I can get away with only checking the usages of the decode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2126592531 From yzheng at openjdk.org Wed Jun 4 13:23:58 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 4 Jun 2025 13:23:58 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v5] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 08:07:40 GMT, Tom Shull wrote: >> This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. >> >> In addition, two methods are added to the BootstrapMethodInvocations: >> 1. `void resolve()` >> 2. `JavaConstant lookup()` >> >> The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. > > Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8357660 > - Merge remote-tracking branch 'origin/master' into JDK-8357660 > - commit to trigger testing > - commit to trigger testing > - reviewer feedback and update javadoc formatting > - complete changes > - commit review suggestion > > Co-authored-by: Douglas Simon > - commit review suggestion > > Co-authored-by: Douglas Simon > - change to allow both indys and condys to be looked up all at once > - address reviewer feedback > - ... and 2 more: https://git.openjdk.org/jdk/compare/2eb99b1a...c7f5c1a7 LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25420#pullrequestreview-2896717439 From shade at openjdk.org Wed Jun 4 13:37:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 13:37:55 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v4] In-Reply-To: References: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> Message-ID: On Wed, 4 Jun 2025 12:28:45 GMT, Manuel H?ssig wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Revert buffer size change >> - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count >> - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count >> - Better test, patch amendments >> - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count >> - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count >> - Unnecessary arch limitation >> - Simplify test >> - Adjust test bound >> - Fix > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 2: > >> 1: /* >> 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. > > IANAL, but shouldn't this include the year? Nope, that's our standard header. Saves us a hassle of updating the header every 365 days ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2126623649 From duke at openjdk.org Wed Jun 4 13:49:59 2025 From: duke at openjdk.org (duke) Date: Wed, 4 Jun 2025 13:49:59 GMT Subject: RFR: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool [v5] In-Reply-To: References: Message-ID: <4wnk2wbbgb3jbkjwVVvDH6JZH-EkT55XT01HhtXmVAI=.8d87c14e-770f-4a24-99af-5bff3c3fb89b@github.com> On Wed, 4 Jun 2025 08:07:40 GMT, Tom Shull wrote: >> This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. >> >> In addition, two methods are added to the BootstrapMethodInvocations: >> 1. `void resolve()` >> 2. `JavaConstant lookup()` >> >> The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. > > Tom Shull has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8357660 > - Merge remote-tracking branch 'origin/master' into JDK-8357660 > - commit to trigger testing > - commit to trigger testing > - reviewer feedback and update javadoc formatting > - complete changes > - commit review suggestion > > Co-authored-by: Douglas Simon > - commit review suggestion > > Co-authored-by: Douglas Simon > - change to allow both indys and condys to be looked up all at once > - address reviewer feedback > - ... and 2 more: https://git.openjdk.org/jdk/compare/865473d8...c7f5c1a7 @teshull Your change (at version c7f5c1a79a8ef8fdc7d50ee03b78ebc62b53fc83) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2940109500 From kvn at openjdk.org Wed Jun 4 13:53:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 13:53:54 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor [v2] In-Reply-To: <6yLBtrUKBPgV63susOsKKhAPYCofyOI_Yd0wqbSqrCU=.12d4c0ca-0fa6-4000-a5e1-3ffd0f2ea6cc@github.com> References: <6yLBtrUKBPgV63susOsKKhAPYCofyOI_Yd0wqbSqrCU=.12d4c0ca-0fa6-4000-a5e1-3ffd0f2ea6cc@github.com> Message-ID: On Tue, 3 Jun 2025 15:36:22 GMT, Ashutosh Mehra wrote: >> This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. See description of [JDK-8358330](https://bugs.openjdk.org/browse/JDK-8358330) for more details. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8358330 > - Address review comments > > Signed-off-by: Ashutosh Mehra > - 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor > > Signed-off-by: Ashutosh Mehra My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25598#pullrequestreview-2896826999 From duke at openjdk.org Wed Jun 4 13:54:00 2025 From: duke at openjdk.org (Tom Shull) Date: Wed, 4 Jun 2025 13:54:00 GMT Subject: Integrated: 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool In-Reply-To: References: Message-ID: On Fri, 23 May 2025 17:37:14 GMT, Tom Shull wrote: > This PR adds support for directly retrieving both all invokedynamic and all condy BootstrapMethodInvocations from a ConstantPool via the new method `List lookupBootstrapMethodInvocations(boolean invokeDynamic)`. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolve()` > 2. `JavaConstant lookup()` > > The combination of these two features allows one to directly interact with all BSM information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes and/or iterate through all Constant Pool entries. This pull request has now been integrated. Changeset: 0352477f Author: Tom Shull Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/0352477ff5977b0010e62000adbde88026a49a7e Stats: 144 lines in 5 files changed: 132 ins; 0 del; 12 mod 8357660: [JVMCI] Add support for retrieving all BootstrapMethodInvocations directly from ConstantPool Reviewed-by: dnsimon, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25420 From kvn at openjdk.org Wed Jun 4 13:57:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 13:57:54 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v4] In-Reply-To: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> References: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> Message-ID: On Wed, 4 Jun 2025 11:01:38 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Revert buffer size change > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Re-approved ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24972#pullrequestreview-2896843574 From asmehra at openjdk.org Wed Jun 4 14:01:59 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 4 Jun 2025 14:01:59 GMT Subject: RFR: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor [v2] In-Reply-To: References: <6yLBtrUKBPgV63susOsKKhAPYCofyOI_Yd0wqbSqrCU=.12d4c0ca-0fa6-4000-a5e1-3ffd0f2ea6cc@github.com> Message-ID: On Wed, 4 Jun 2025 13:50:51 GMT, Vladimir Kozlov wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8358330 >> - Address review comments >> >> Signed-off-by: Ashutosh Mehra >> - 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor >> >> Signed-off-by: Ashutosh Mehra > > My testing passed. @vnkozlov thanks for testing and reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25598#issuecomment-2940155028 From kvn at openjdk.org Wed Jun 4 14:06:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 14:06:53 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:13:35 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Add comment to benchmark as to why we fix the heap size > - Add missing null chec > - Fix typos Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2896878808 From rcastanedalo at openjdk.org Wed Jun 4 14:09:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Jun 2025 14:09:53 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> Message-ID: <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> On Wed, 4 Jun 2025 09:23:24 GMT, Roland Westrelin wrote: > > In the common case where allocations are not eliminated, matching transforms the introduced `NarrowMemProj` nodes into a sequence of redundant, raw `MemProj` nodes, see e.g. B6 here: [after-gcm.pdf](https://github.com/user-attachments/files/20477560/after-gcm.pdf). Would it be possible to clean them up during matching (or perhaps already during, or right after, macro expansion)? > > Thanks for looking at this @robcasloz I made the change you requested. Thanks, will run some testing and come back with the results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2940179798 From shade at openjdk.org Wed Jun 4 14:24:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 14:24:10 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v4] In-Reply-To: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> References: <2KMWJk5gHoc3t7lOHQRIyLViPdWxuQJpGsYFgma-Sic=.007d743c-8951-4719-b6b2-33dff63860e4@github.com> Message-ID: On Wed, 4 Jun 2025 11:01:38 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [x] GHA >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Revert buffer size change > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Excellent, thanks for reviews! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24972#issuecomment-2940233103 From shade at openjdk.org Wed Jun 4 14:24:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 14:24:14 GMT Subject: Integrated: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 19:00:23 GMT, Aleksey Shipilev wrote: > There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. > > But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. > > > $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ > -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done > > # Before > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > # After > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > > It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. > > Additional testing: > - [x] New regression test passes with the fix, fails without it > - [x] GHA > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: 4e314cb9 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/4e314cb9e025672b2f7b68cc021fa516ee219ad8 Stats: 185 lines in 2 files changed: 182 ins; 0 del; 3 mod 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines Reviewed-by: kvn, dfenacci, galder ------------- PR: https://git.openjdk.org/jdk/pull/24972 From rcastanedalo at openjdk.org Wed Jun 4 15:08:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Jun 2025 15:08:51 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations In-Reply-To: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: On Tue, 3 Jun 2025 07:17:47 GMT, Daniel Skantz wrote: > This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. > > Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. > > Testing: T1-4. > > Extra testing: ran the tests manually with `-XX:+PrintOptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java line 1: > 1: /* Do we need to add a second run at all to this test case? As far as I can see, all `concat*` test cases use explicit string builders and already exercise C2's string concatenation optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25610#discussion_r2126842291 From epeter at openjdk.org Wed Jun 4 16:02:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 16:02:12 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class Message-ID: We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). https://github.com/openjdk/jdk/blob/11d55dc7ff2b2137700248e11492e11d2d748cab/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L44 ------------- Commit messages: - JDK-8358600 Changes: https://git.openjdk.org/jdk/pull/25643/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358600 Stats: 271 lines in 2 files changed: 271 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25643/head:pull/25643 PR: https://git.openjdk.org/jdk/pull/25643 From sviswanathan at openjdk.org Wed Jun 4 16:12:56 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 4 Jun 2025 16:12:56 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:07:34 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions src/hotspot/cpu/x86/x86_64.ad line 11341: > 11339: %{ > 11340: // Strict predicate check to make selection of xorL_rReg_im1_ndd cost agnostic if immL32 src2 is -1. > 11341: predicate(UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); We need a check here for isa_long() before accessing is_long() otherwise is_long() may assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2125105798 From asmehra at openjdk.org Wed Jun 4 16:55:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 4 Jun 2025 16:55:56 GMT Subject: Integrated: 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:32:12 GMT, Ashutosh Mehra wrote: > This patch fixes a possible assert in debug builds if the allocation of memory for a CodeBlob fails when loading it from the AOT Code Cache. See description of [JDK-8358330](https://bugs.openjdk.org/browse/JDK-8358330) for more details. This pull request has now been integrated. Changeset: fd0ab043 Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/fd0ab043677d103628afde628e3e75e23fb518b2 Stats: 51 lines in 5 files changed: 21 ins; 27 del; 3 mod 8358330: AsmRemarks and DbgStrings clear() method may not get called before their destructor Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25598 From cslucas at openjdk.org Wed Jun 4 17:41:03 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 4 Jun 2025 17:41:03 GMT Subject: RFR: 8358534: Bailout in Conv2B::Ideal when type of cmp input is not supported In-Reply-To: References: <2kB23xVQDRb7YT6aMt1SbIfPwSG1ummK29A1Hs3FD0Y=.59ea7dd7-c6fd-486c-a996-f839c9a15718@github.com> <6F6gsbyRSrJ7_XHXoMh8j15Mog2DMec5DaOCVAdcdFQ=.0799c52f-275c-4303-8bd8-05f341c20ae0@github.com> Message-ID: On Wed, 4 Jun 2025 10:25:30 GMT, Aleksey Shipilev wrote: >> I agree, the `return nullptr;` should have been added below the assert in the else branch, no check required. > > Yes, putting `return nullptr;` into existing branch would have been cleaner. One more reason to wait for reviews! We can do a quick follow-up that sweeps this return to its better place, if you feel strongly about it. Note this would likely get backported, so being extra-clean pays off the process hassle. It is about 15 minute deal for me, I am happy to do it as a penance for not coming up with it myself :) I agree that just adding the return after the assert would have been a better option. I opted for adding the `if (cmp == nullptr)` because of excess of caution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25627#discussion_r2127115196 From jbhateja at openjdk.org Wed Jun 4 18:05:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Jun 2025 18:05:52 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v4] In-Reply-To: References: Message-ID: <4RfdfDQvc_xT1eNJKUf0AiSBJEU27P8WWmnufSyi6WI=.ee7abfe6-e927-4230-a1f5-ecfedad9a630@github.com> On Tue, 3 Jun 2025 23:21:52 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/cpu/x86/x86_64.ad line 11341: > >> 11339: %{ >> 11340: // Strict predicate check to make selection of xorL_rReg_im1_ndd cost agnostic if immL32 src2 is -1. >> 11341: predicate(UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); > > We need a check here for isa_long() before accessing is_long() otherwise is_long() may assert. Matcher DFA state checks preceding predicate checks will implicitly ensure this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2127155498 From sviswanathan at openjdk.org Wed Jun 4 21:14:59 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 4 Jun 2025 21:14:59 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v6] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 09:51:03 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments. test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/BlsiTestI.java line 78: > 76: (byte) 0x00, > 77: (byte) 0xF3, > 78: (byte) 0x3}; The line 78 should be same as line 61. test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/BlsmskTestI.java line 77: > 75: (byte) 0x00, > 76: (byte) 0xF3, > 77: (byte) 0x2}; This line 77 should be same as line 60. test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/BlsrTestI.java line 78: > 76: (byte) 0x00, > 77: (byte) 0xF3, > 78: (byte) 0x1}; The line 78 should be same as line 61. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2127042159 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2127043141 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2127043645 From epeter at openjdk.org Thu Jun 5 06:16:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Jun 2025 06:16:47 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v2] In-Reply-To: References: Message-ID: > We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. > > This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). > > https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: streamline API to a single render method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25643/files - new: https://git.openjdk.org/jdk/pull/25643/files/11d55dc7..dc640cbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=00-01 Stats: 70 lines in 2 files changed: 22 ins; 26 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/25643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25643/head:pull/25643 PR: https://git.openjdk.org/jdk/pull/25643 From duke at openjdk.org Thu Jun 5 07:20:07 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 5 Jun 2025 07:20:07 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: References: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> Message-ID: <8OzDPy_-fHmZXhZ2fvVjqmDWzIibOPNix2SDuwkRQbg=.8b1fb4a0-399f-4f58-af2b-5ce2a0f7bfbc@github.com> On Mon, 5 May 2025 18:10:02 GMT, Yuri Gaevsky wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> change slli+add sequence to shadd > > As you can expect I am trying to implement the following code with RVV: > > for (; i + (N-1) < cnt; i += N) { > h = 31^^N * h > + 31^^(N-1) * val[i + 0] > + 31^^(N-2) * val[i + 1] > ... > + 31^^1 * val[i + (N-2)] > + 31^^0 * val[i + (N-1)]; > } > for (; i < cnt; i++) { > h = 31 * h + val[i]; > } > > where `N` is a number of processing array elements in "chunk". > IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`. > > h = 31^^M * h > + 31^^(M-1) * val[i + 0] > + 31^^(M-2) * val[i + 1] > ... > + 31^^1 * val[i + (M-2)] > + 32^^0 * val[i + (M-1)]; > > or returning to our `N` for clarity > > h = 31^^(N-1) * h > + 31^^(N-2) * val[i + 0] > + 31^^(N-3) * val[i + 1] > ... > + 31^^1 * val[i + (N-3)] > + 31^^0 * val[i + (N-2)]; > > Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop... > @ygaevsky @RealFYang how can we procced ? My apologies, just busy at the moment with other things, going to update the patch soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2943036356 From jbhateja at openjdk.org Thu Jun 5 08:08:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Jun 2025 08:08:48 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v7] In-Reply-To: References: Message-ID: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/38bf655e..45db368d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=05-06 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From roland at openjdk.org Thu Jun 5 08:27:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Jun 2025 08:27:47 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v34] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 94 commits: - small fix - Merge branch 'master' into JDK-8342692 - review - review - Update test/micro/org/openjdk/bench/java/lang/foreign/HeapMismatchManualLoopTest.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopScaleOverflow.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopPredicatesClone.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoop.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - ... and 84 more: https://git.openjdk.org/jdk/compare/faf19abd...fd19ee84 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=33 Stats: 1618 lines in 26 files changed: 1539 ins; 22 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From dskantz at openjdk.org Thu Jun 5 08:51:11 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Thu, 5 Jun 2025 08:51:11 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations [v2] In-Reply-To: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: > This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. > > Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. > > Testing: T1-4. > > Extra testing: ran the tests manually with `-XX:+PrintOptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: - revert change to TestStringIntrinsics.java - Update test/hotspot/jtreg/compiler/c2/Test7046096.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25610/files - new: https://git.openjdk.org/jdk/pull/25610/files/731667f4..7fd8568a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25610&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25610&range=00-01 Stats: 13 lines in 2 files changed: 0 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25610.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25610/head:pull/25610 PR: https://git.openjdk.org/jdk/pull/25610 From dskantz at openjdk.org Thu Jun 5 08:51:11 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Thu, 5 Jun 2025 08:51:11 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations [v2] In-Reply-To: References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: On Wed, 4 Jun 2025 15:06:19 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: >> >> - revert change to TestStringIntrinsics.java >> - Update test/hotspot/jtreg/compiler/c2/Test7046096.java >> >> Co-authored-by: Emanuel Peter > > test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java line 1: > >> 1: /* > > Do we need to add a second run at all to this test case? As far as I can see, all `concat*` test cases use explicit string builders and already exercise C2's string concatenation optimizations. Thanks for checking! I reverted the change to this test as on a second look the benefits of adding the new configuration to it are modest and out of scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25610#discussion_r2128300211 From duke at openjdk.org Thu Jun 5 09:12:30 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:12:30 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Addressed some review comments - Merge branch 'master' into JDK-8354242 - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - ... and 2 more: https://git.openjdk.org/jdk/compare/71938fba...ebbcc405 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/f2f71e34..ebbcc405 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=05-06 Stats: 146911 lines in 2345 files changed: 87334 ins; 41007 del; 18570 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From duke at openjdk.org Thu Jun 5 09:12:33 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:12:33 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 12:03:48 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >> Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2213: > >> 2211: Node* in1 = in(1); >> 2212: Node* in2 = in(2); >> 2213: // Transformations for predicated IRs are not supported for now. > > Suggestion: > > // Transformations for predicated vectors are not supported for now. Done. > src/hotspot/share/opto/vectornode.cpp line 2215: > >> 2213: // Transformations for predicated IRs are not supported for now. >> 2214: if (is_predicated_vector() || in1->is_predicated_vector() || >> 2215: in2->is_predicated_vector()) { > > I would either put all on the same line, or all on separate lines. Done. > src/hotspot/share/opto/vectornode.cpp line 2219: > >> 2217: } >> 2218: >> 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast to in1. > > Suggestion: > > // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. > > Would look a little cleaner, and you did also not write `Op_VectorMaskCmp` either ;) Done, thanks! > src/hotspot/share/opto/vectornode.cpp line 2225: > >> 2223: } >> 2224: >> 2225: const TypeVect* vmcast_vt = nullptr; > > Suggestion: > > const TypeVect* vector_mask_cast_vt = nullptr; > > I think it would not hurt to write it out. Otherwise, the reader always has to reconstruct that in their head. Done. > src/hotspot/share/opto/vectornode.cpp line 2230: > >> 2228: vmcast_vt = in1->as_Vector()->vect_type(); >> 2229: in1 = in1->in(1); >> 2230: } > > Add a comment why you check `in1->outcnt() == 1`. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128341063 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128340484 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128341959 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128342468 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128342908 From duke at openjdk.org Thu Jun 5 09:12:33 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:12:33 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Thu, 29 May 2025 08:00:05 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2233: >> >>> 2231: if (in2->Opcode() == Op_VectorMaskCast) { >>> 2232: in2 = in2->in(1); >>> 2233: } >> >> Wow, this seems to be an addition that is not covered in the patterns you mention above, right? >> But is that even necessary? >> I suppose here `in2 = VectorMaskCast(all_ones_vector)`. >> Would we not already want to transform this pattern in `VectorMaskCast::Ideal`, is that not possible and more powerful? > > Oh yeah, I forgot to mention it in the above comment and commit message. > > Yes, this is for `in2 = VectorMaskCast(all_ones_vector)`. I agree it's better to do this transformation in `VectorMaskCast::Ideal`. I'll remove this code change and do the `VectorMaskCast` optimization later. Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128344233 From duke at openjdk.org Thu Jun 5 09:17:54 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:17:54 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: <9u6hJ-WgnHLMaYBa8ViRdpUZY-bI2wOk-TCRKWJJdqk=.b3303f1f-da3b-4c2e-8f0c-a2e16ba9688e@github.com> Message-ID: On Thu, 29 May 2025 07:55:06 GMT, erifan wrote: >> Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use it as `in1_mask_cmp` instead? > >> What is the hard-coded ^ 4 here? > > This is to negate the comparison condition. We can't use `BoolTest::negate()` here because the comparison condition may be **unsigned** comparison. Since there's already a `negate()` function in `BoolTest`, so I tend to add a new function `get_negative_predicate` for this into class `VectorMaskCmpNode`. > >> Also: You now cast (VectorMaskCmpNode*) in1 twice. Can we not do as_VectorMaskCmp()? Or could we at least cast it only once, and then use it as in1_mask_cmp instead? > > For the first cast, I think you mean > > if (in1->Opcode() != Op_VectorMaskCmp || > in1->outcnt() > 1 || > !((VectorMaskCmpNode*) in1)->predicate_can_be_negated() || > !VectorNode::is_all_ones_vector(in2)) { > return nullptr; > } > > To remove one cast, then we have to split the above `if` because `in1` may not be a `VectorMaskCmpNode`. > > if (in1->Opcode() != Op_VectorMaskCmp) { > return nullptr; > } > VectorMaskCmpNode* in1_as_mask_cmp = (VectorMaskCmpNode*) in1; > if (in1->outcnt() > 1 || > !in1_as_mask_cmp->predicate_can_be_negated() || > !VectorNode::is_all_ones_vector(in2)) { > return nullptr; > } > BoolTest::mask neg_cond = (BoolTest::mask) (in1_as_mask_cmp->get_predicate() ^ 4); > > Does this look better to you ? For now I kept the current approach, as I feel it's a little more compact. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128358563 From epeter at openjdk.org Thu Jun 5 09:26:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Jun 2025 09:26:59 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 09:12:30 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Addressed some review comments > - Merge branch 'master' into JDK-8354242 > - Refactor the JTReg tests for compare.xor(maskAll) > > Also made a bit change to support pattern `VectorMask.fromLong()`. > - Merge branch 'master' into JDK-8354242 > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - ... and 2 more: https://git.openjdk.org/jdk/compare/93b141e6...ebbcc405 FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { return mask(_test^4); }` I think you should use that instead :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943422494 From duke at openjdk.org Thu Jun 5 09:27:03 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:27:03 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 12:18:15 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >> Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2251: > >> 2249: predicate_node, vt); >> 2250: if (vmcast_vt != nullptr) { >> 2251: // We optimized out an VectorMaskCast, and in order to ensure type > > Suggestion: > > // We optimized out a VectorMaskCast, and in order to ensure type Done. > src/hotspot/share/opto/vectornode.cpp line 2253: > >> 2251: // We optimized out an VectorMaskCast, and in order to ensure type >> 2252: // correctness, we need to regenerate one. VectorMaskCast will be encoded as >> 2253: // empty for types with the same size. > > Suggestion: > > // a no-op (identity function) for types with the same size. > > Or what do you mean by "empty"? `TOP`? All zeros? I mean `no-op`. Done, thanks. > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 96: > >> 94: Generator lGen = RD.uniformLongs(Long.MIN_VALUE, Long.MAX_VALUE); >> 95: Generator fGen = RD.uniformFloats(Float.MIN_VALUE, Float.MAX_VALUE); >> 96: Generator dGen = RD.uniformDoubles(Double.MIN_VALUE, Double.MAX_VALUE); > > Are you sure you only want to draw from the uniform distribution? > If you don't super care about the distribution, please just take `RD.ints/longs/floats/doubles()`. > That way, you get all sorts of distributions, and also some that include NaN values etc. I think that would be important for your float cmp cases, no? For float and double, we have to use the uniform distribution, because we have to make sure `NAN` is not generated. I added some comments about the reasons. For other types, changed to use `RD.ints/longs`. We have covered the special cases like +/- Inf, NaN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128376851 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128378888 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128375032 From duke at openjdk.org Thu Jun 5 09:27:04 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:27:04 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: <9u6hJ-WgnHLMaYBa8ViRdpUZY-bI2wOk-TCRKWJJdqk=.b3303f1f-da3b-4c2e-8f0c-a2e16ba9688e@github.com> References: <9u6hJ-WgnHLMaYBa8ViRdpUZY-bI2wOk-TCRKWJJdqk=.b3303f1f-da3b-4c2e-8f0c-a2e16ba9688e@github.com> Message-ID: On Wed, 28 May 2025 12:28:20 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: >> >>> 235: // Byte tests >>> 236: @Test >>> 237: @IR(counts = { IRNode.XOR_V_MASK, "= 0", IRNode.XOR_VB, "= 0" }, >> >> Could you still assert the presence of some other vectors, just to make sure we are indeed getting vectors here? > > Not testing for any present vectors makes me a little nervous: what if we just don't get any vectors because inlining fails or something else silly happens? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128379987 From duke at openjdk.org Thu Jun 5 09:27:05 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:27:05 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Thu, 29 May 2025 01:44:49 GMT, Xiaohong Gong wrote: >> test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 49: >> >>> 47: private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; >>> 48: private static final VectorSpecies F_SPECIES = FloatVector.SPECIES_MAX; >>> 49: private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; >> >> Are you taking `SPECIES_MAX` on purpose here, or could we take `SPECIES_PREFERRED` instead? >> @jatin-bhateja What is the best to do in these tests? I suppose best would be to test with all vector lengths... > > Thanks for pointing out this @eme64 ! Per my understanding, `SPECIES_MAX` is almost the same with `SPECIES_PREFERRED` in this case which are all specified to the max vector size of a hardware. Since the max vector size is different on different architectures, not all vector lengths are supported to be intrinsified on a specified architecture like AArch64, especially the SVE arch with different vector register size. Hence, just testing the max species makes sense to me as this is a mid-end common transformation. Changed to use `ofLargestShape()` because on x64 the max vector length is related to data types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2128383805 From duke at openjdk.org Thu Jun 5 09:34:59 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:34:59 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 09:24:10 GMT, Emanuel Peter wrote: > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { return mask(_test^4); }` I think you should use that instead :) Indeed, I hadn't noticed that, thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943449327 From duke at openjdk.org Thu Jun 5 09:51:59 2025 From: duke at openjdk.org (erifan) Date: Thu, 5 Jun 2025 09:51:59 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 09:32:15 GMT, erifan wrote: > > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { return mask(_test^4); }` I think you should use that instead :) > > Indeed, I hadn't noticed that, thank you. Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943500455 From epeter at openjdk.org Thu Jun 5 11:08:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Jun 2025 11:08:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: References: Message-ID: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> On Thu, 5 Jun 2025 09:48:46 GMT, erifan wrote: > Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it. Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2943747849 From mhaessig at openjdk.org Thu Jun 5 11:34:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 5 Jun 2025 11:34:51 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v2] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 06:16:47 GMT, Emanuel Peter wrote: >> We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. >> >> This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). >> >> https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > streamline API to a single render method Thank you for your continued work on the Template Framework. This seems like good start to the template library. While it already looks good, I have a few small questions :) Thank you for your continued work on the Template Framework. This seems like good start to the template library. While it already looks good, I have a few small questions :) test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java line 76: > 74: public static String render(final String packageName, > 75: final String className, > 76: final List imports, To eliminate duplicate imports this could also be a `Set`. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestWithTestFrameworkClass.java line 135: > 133: )); > 134: > 135: // Create a test for each operator.. Suggestion: // Create a test for each operator. Tiny nit :) test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestWithTestFrameworkClass.java line 145: > 143: // List of imports. Duplicates are permitted. > 144: List.of("compiler.lib.generators.*", > 145: "compiler.lib.ir_framework.*", Suggestion: This should not be needed since its imported by default in `TestFrameworkClass`. Or is this a deliberate duplication? ------------- PR Review: https://git.openjdk.org/jdk/pull/25643#pullrequestreview-2899874579 PR Review: https://git.openjdk.org/jdk/pull/25643#pullrequestreview-2899929556 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2128593841 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2128586200 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2128588949 From epeter at openjdk.org Thu Jun 5 12:01:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Jun 2025 12:01:17 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v3] In-Reply-To: References: Message-ID: > We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. > > This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). > > https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25643/files - new: https://git.openjdk.org/jdk/pull/25643/files/dc640cbd..256d922c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=01-02 Stats: 10 lines in 2 files changed: 2 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25643/head:pull/25643 PR: https://git.openjdk.org/jdk/pull/25643 From epeter at openjdk.org Thu Jun 5 12:01:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Jun 2025 12:01:17 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v2] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 11:31:50 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> streamline API to a single render method > > Thank you for your continued work on the Template Framework. This seems like good start to the template library. While it already looks good, I have a few small questions :) @mhaessig Thanks for reviewing! I applied you suggestions :) > test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java line 76: > >> 74: public static String render(final String packageName, >> 75: final String className, >> 76: final List imports, > > To eliminate duplicate imports this could also be a `Set`. Sure, I'll make it a `Set`. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestWithTestFrameworkClass.java line 135: > >> 133: )); >> 134: >> 135: // Create a test for each operator.. > > Suggestion: > > // Create a test for each operator. > > Tiny nit :) Fixed. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestWithTestFrameworkClass.java line 145: > >> 143: // List of imports. Duplicates are permitted. >> 144: List.of("compiler.lib.generators.*", >> 145: "compiler.lib.ir_framework.*", > > Suggestion: > > > This should not be needed since its imported by default in `TestFrameworkClass`. Or is this a deliberate duplication? Yes, it was deliberate. But I'll just remove it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25643#issuecomment-2943901270 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2128673497 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2128672040 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2128672889 From rcastanedalo at openjdk.org Thu Jun 5 12:10:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Jun 2025 12:10:57 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> Message-ID: On Wed, 4 Jun 2025 14:06:46 GMT, Roberto Casta?eda Lozano wrote: > Thanks, will run some testing and come back with the results. `compiler/c2/TestVerifyIterativeGVN.java` fails as follows (I tested the PR applied on top of jdk-25+25 but I see the failure also in the [GHA results](https://github.com/rwestrel/jdk/actions/runs/15438508506/job/43452592735)): # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/rocastan/git/views/JDK-8327963-memprojs/open/src/hotspot/share/opto/node.hpp:457), pid=464370, tid=464390 # assert(is_not_dead(n)) failed: can not use dead node # (...) Current CompileTask: C2:128 9 b 4 java.lang.reflect.ClassFileFormatVersion:: (417 bytes) Stack: [0x000070ba24200000,0x000070ba24300000], sp=0x000070ba242fae70, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x519ad1] Node::set_req(unsigned int, Node*)+0x251 (node.hpp:457) V [libjvm.so+0x17d2d99] PhaseIterGVN::subsume_node(Node*, Node*)+0x239 (phaseX.cpp:1425) V [libjvm.so+0x15feaf4] InitializeNode::replace_mem_projs_by(Node*, PhaseIterGVN*)+0x1d4 (phaseX.hpp:539) V [libjvm.so+0x1532cd7] PhaseMacroExpand::expand_allocate_common(AllocateNode*, Node*, TypeFunc const*, unsigned char*, Node*)+0x167 (macro.cpp:1316) V [libjvm.so+0x153e92e] PhaseMacroExpand::expand_macro_nodes()+0xc5e (macro.cpp:2687) V [libjvm.so+0xb286e7] Compile::Optimize()+0xe37 (compile.cpp:2533) (...) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2943934786 From mhaessig at openjdk.org Thu Jun 5 12:11:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 5 Jun 2025 12:11:53 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v3] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 12:01:17 GMT, Emanuel Peter wrote: >> We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. >> >> This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). >> >> https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions Thank you for addressing my comments. Looks good to me! ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25643#pullrequestreview-2900057451 From epeter at openjdk.org Thu Jun 5 12:11:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Jun 2025 12:11:54 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v3] In-Reply-To: References: Message-ID: <7hVvcQKyxl8xFhoBam3QF036Wz_zxqRwlkv1CI5u6EA=.b0695a3d-ca1c-4cf7-a75e-7a77dd4c982b@github.com> On Thu, 5 Jun 2025 12:07:44 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions > > Thank you for addressing my comments. Looks good to me! @mhaessig Thank you ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25643#issuecomment-2943946510 From rcastanedalo at openjdk.org Thu Jun 5 12:56:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Jun 2025 12:56:51 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations [v2] In-Reply-To: References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: On Thu, 5 Jun 2025 08:51:11 GMT, Daniel Skantz wrote: >> This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. >> >> Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. >> >> Testing: T1-4. >> >> Extra testing: ran the tests manually with `-XX:+PrintOptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - revert change to TestStringIntrinsics.java > - Update test/hotspot/jtreg/compiler/c2/Test7046096.java > > Co-authored-by: Emanuel Peter Thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25610#pullrequestreview-2900209239 From sviswanathan at openjdk.org Thu Jun 5 21:33:57 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 5 Jun 2025 21:33:57 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v7] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 08:08:48 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions src/hotspot/cpu/x86/x86_64.ad line 10621: > 10619: %{ > 10620: // Strict predicate check to make selection of xorI_rReg_im1 cost agnostic if immI src is -1. > 10621: predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); We don't need to add the strict predicate check to xorI_rReg_imm, xorI_rReg_rReg_imm_ndd, xorL_rReg_imm, xorL_rReg_rReg_imm_ndd. The only change required is to add ins_cost(150) to xorI_rReg_mem_imm_ndd and xorL_rReg_mem_imm_ndd. src/hotspot/cpu/x86/x86_64.ad line 10636: > 10634: %{ > 10635: // Strict predicate check to make selection of xorI_rReg_im1_ndd cost agnostic if immI src2 is -1. > 10636: predicate(UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); This strict predicate check could be removed. src/hotspot/cpu/x86/x86_64.ad line 11328: > 11326: %{ > 11327: // Strict predicate check to make selection of xorL_rReg_im1 cost agnostic if immL32 src is -1. > 11328: predicate(!UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); This strict predicate check could be removed. src/hotspot/cpu/x86/x86_64.ad line 11343: > 11341: %{ > 11342: // Strict predicate check to make selection of xorL_rReg_im1_ndd cost agnostic if immL32 src2 is -1. > 11343: predicate(UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); This strict predicate check could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2130466887 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2130468837 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2130469792 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2130470425 From duke at openjdk.org Thu Jun 5 23:32:26 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 5 Jun 2025 23:32:26 GMT Subject: RFR: 8357951: Remove the IdealLoopTree* loop parameter from PhaseIdealLoop::loop_iv_phi Message-ID: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> This PR removes unused parameters, namely `IdealLoopTree* loop` from `PhaseIdealLoop::loop_iv_phi` and `PhaseIdealLoop::loop_iv_stride` Best regards! ------------- Commit messages: - 8357951: Fix bad automatic renaming - 8357951: Remove unused parameters from PhaseIdealLoop::loop_iv_stride and PhaseIdealLoop::loop_iv_phi Changes: https://git.openjdk.org/jdk/pull/25659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25659&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357951 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25659/head:pull/25659 PR: https://git.openjdk.org/jdk/pull/25659 From thartmann at openjdk.org Thu Jun 5 23:32:26 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 5 Jun 2025 23:32:26 GMT Subject: RFR: 8357951: Remove the IdealLoopTree* loop parameter from PhaseIdealLoop::loop_iv_phi In-Reply-To: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> References: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> Message-ID: On Thu, 5 Jun 2025 12:02:10 GMT, Beno?t Maillard wrote: > This PR removes unused parameters, namely `IdealLoopTree* loop` from `PhaseIdealLoop::loop_iv_phi` and `PhaseIdealLoop::loop_iv_stride` > > Best regards! Looks good and trivial to me. Congratulations on your first PR! :slightly_smiling_face: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25659#pullrequestreview-2900231530 From sviswanathan at openjdk.org Thu Jun 5 23:50:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 5 Jun 2025 23:50:52 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v7] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 08:08:48 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/LZcntTestI.java line 57: > 55: // REX2 variant > 56: instrMaskAPX = new byte[]{(byte) 0xFF, (byte)0x80, (byte) 0xFF}; > 57: instrPatternAPX = new byte[]{(byte) 0xD5, (byte) 0x80, (byte) 0xBD}; I think we should check for 0xF3 as well here for lzcnt to differentiate it from bsr: instrMaskAPX = new byte[]{(byte) 0xFF, (byte) 0xFF, (byte)0x80, (byte) 0xFF}; instrPatternAPX = new byte[]{(byte) 0xF3, (byte) 0xD5, (byte) 0x80, (byte) 0xBD}; test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/TZcntTestI.java line 56: > 54: // REX2 variant > 55: instrMaskAPX = new byte[]{(byte) 0xFF, (byte)0x80, (byte) 0xFF}; > 56: instrPatternAPX = new byte[]{(byte) 0xD5, (byte) 0x80, (byte) 0xBC}; I think we should check for 0xF3 as well here for tzcnt to differentiate it from bsf: instrMaskAPX = new byte[]{(byte) 0xFF, (byte) 0xFF, (byte)0x80, (byte) 0xFF}; instrPatternAPX = new byte[]{(byte) 0xF3, (byte) 0xD5, (byte) 0x80, (byte) 0xBC}; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2130968611 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2130962379 From kbarrett at openjdk.org Fri Jun 6 06:13:32 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Jun 2025 06:13:32 GMT Subject: RFR: 8342639: Global operator new in adlc has wrong exception spec Message-ID: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> Please review this change to remove the definition of `operator new(size_t, int, const char*, int) throw()` It doesn't seem to be needed anymore, if it ever really was. See discussion in JBS for some more details. Testing: mach5 tier1-5, to cover various different build configurations. ------------- Commit messages: - remove operator new Changes: https://git.openjdk.org/jdk/pull/25668/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25668&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342639 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25668.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25668/head:pull/25668 PR: https://git.openjdk.org/jdk/pull/25668 From jbhateja at openjdk.org Fri Jun 6 06:42:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Jun 2025 06:42:54 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v7] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 21:29:46 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review resolutions > > src/hotspot/cpu/x86/x86_64.ad line 10636: > >> 10634: %{ >> 10635: // Strict predicate check to make selection of xorI_rReg_im1_ndd cost agnostic if immI src2 is -1. >> 10636: predicate(UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1); > > This strict predicate check could be removed. It's over a general immI match and make selection more robust and agnostic to static patten cost. > src/hotspot/cpu/x86/x86_64.ad line 11328: > >> 11326: %{ >> 11327: // Strict predicate check to make selection of xorL_rReg_im1 cost agnostic if immL32 src is -1. >> 11328: predicate(!UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); > > This strict predicate check could be removed. Please see that checks are only on generic immediate rules and constrain selection. > src/hotspot/cpu/x86/x86_64.ad line 11343: > >> 11341: %{ >> 11342: // Strict predicate check to make selection of xorL_rReg_im1_ndd cost agnostic if immL32 src2 is -1. >> 11343: predicate(UseAPX && n->in(2)->bottom_type()->is_long()->get_con() != -1L); > > This strict predicate check could be removed. Same as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2131603149 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2131603957 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2131604272 From jbhateja at openjdk.org Fri Jun 6 06:56:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Jun 2025 06:56:52 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v7] In-Reply-To: References: Message-ID: <8SMpHtG_wt9kTWUCrMrmUt6ae0ecV63YdDr6yh_YySU=.414b0712-9d40-4685-968c-19ded7db2951@github.com> On Thu, 5 Jun 2025 23:48:03 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review resolutions > > test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/LZcntTestI.java line 57: > >> 55: // REX2 variant >> 56: instrMaskAPX = new byte[]{(byte) 0xFF, (byte)0x80, (byte) 0xFF}; >> 57: instrPatternAPX = new byte[]{(byte) 0xD5, (byte) 0x80, (byte) 0xBD}; > > I think we should check for 0xF3 as well here for lzcnt to differentiate it from bsr: > instrMaskAPX = new byte[]{(byte) 0xFF, (byte) 0xFF, (byte)0x80, (byte) 0xFF}; > instrPatternAPX = new byte[]{(byte) 0xF3, (byte) 0xD5, (byte) 0x80, (byte) 0xBD}; Mask 0xFF is against the legacy prefix byte, which should be checked in entirety i.e. 0xF3 > test/hotspot/jtreg/compiler/intrinsics/bmi/verifycode/TZcntTestI.java line 56: > >> 54: // REX2 variant >> 55: instrMaskAPX = new byte[]{(byte) 0xFF, (byte)0x80, (byte) 0xFF}; >> 56: instrPatternAPX = new byte[]{(byte) 0xD5, (byte) 0x80, (byte) 0xBC}; > > I think we should check for 0xF3 as well here for tzcnt to differentiate it from bsf: > instrMaskAPX = new byte[]{(byte) 0xFF, (byte) 0xFF, (byte)0x80, (byte) 0xFF}; > instrPatternAPX = new byte[]{(byte) 0xF3, (byte) 0xD5, (byte) 0x80, (byte) 0xBC}; Mask 0xFF is against the legacy prefix byte, which should be checked in entirety i.e. 0xF3 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2131620708 PR Review Comment: https://git.openjdk.org/jdk/pull/25501#discussion_r2131620781 From duke at openjdk.org Fri Jun 6 07:05:03 2025 From: duke at openjdk.org (erifan) Date: Fri, 6 Jun 2025 07:05:03 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> References: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> Message-ID: <7jhSkkRnLI9jPxnO55qlmkoJa-0By2VbkUnAFsJsFD8=.1eda80fa-424f-4766-9fb6-2cf6eb061c68@github.com> On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: > > Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported. Make sense, I'll update later, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2948297917 From mhaessig at openjdk.org Fri Jun 6 07:08:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 6 Jun 2025 07:08:51 GMT Subject: RFR: 8357951: Remove the IdealLoopTree* loop parameter from PhaseIdealLoop::loop_iv_phi In-Reply-To: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> References: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> Message-ID: On Thu, 5 Jun 2025 12:02:10 GMT, Beno?t Maillard wrote: > This PR removes unused parameters, namely `IdealLoopTree* loop` from `PhaseIdealLoop::loop_iv_phi` and `PhaseIdealLoop::loop_iv_stride` > > Best regards! Looks good and trivial to me as well. Congrats on your first PR ? ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25659#pullrequestreview-2903994593 From duke at openjdk.org Fri Jun 6 07:24:50 2025 From: duke at openjdk.org (duke) Date: Fri, 6 Jun 2025 07:24:50 GMT Subject: RFR: 8357951: Remove the IdealLoopTree* loop parameter from PhaseIdealLoop::loop_iv_phi In-Reply-To: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> References: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> Message-ID: On Thu, 5 Jun 2025 12:02:10 GMT, Beno?t Maillard wrote: > This PR removes unused parameters, namely `IdealLoopTree* loop` from `PhaseIdealLoop::loop_iv_phi` and `PhaseIdealLoop::loop_iv_stride` > > Best regards! @benoitmaillard Your change (at version b6d2b57d81e2e0aa919a101e30a440359ee5e7f5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25659#issuecomment-2948341047 From jbhateja at openjdk.org Fri Jun 6 07:46:32 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Jun 2025 07:46:32 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v8] In-Reply-To: References: Message-ID: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resoltions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25501/files - new: https://git.openjdk.org/jdk/pull/25501/files/45db368d..a89188e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=06-07 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From duke at openjdk.org Fri Jun 6 07:47:30 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 6 Jun 2025 07:47:30 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused Message-ID: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> This PR introduces two cleanup changes to `PhaseMacroExpand`: - Removes the unused field `PhaseMacroExpand::_has_locks` - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop Previously, `eliminate_macro_nodes` used two separate `while` loops: - The first loop removed lock nodes - The second loop removed allocation nodes Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. Thanks! ------------- Commit messages: - 8356780: Remove useless assert - 8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes - 8356780: Remove unused field PhaseMacroExpand::_has_locks Changes: https://git.openjdk.org/jdk/pull/25669/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25669&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356780 Stats: 32 lines in 2 files changed: 4 ins; 24 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25669/head:pull/25669 PR: https://git.openjdk.org/jdk/pull/25669 From duke at openjdk.org Fri Jun 6 08:18:56 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 6 Jun 2025 08:18:56 GMT Subject: Integrated: 8357951: Remove the IdealLoopTree* loop parameter from PhaseIdealLoop::loop_iv_phi In-Reply-To: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> References: <7wwF_valtc-AYihX6oADgPvPNczVqSr8-UHre6S1-4U=.66909b6f-18dd-43b8-a14a-509b4e525b5e@github.com> Message-ID: On Thu, 5 Jun 2025 12:02:10 GMT, Beno?t Maillard wrote: > This PR removes unused parameters, namely `IdealLoopTree* loop` from `PhaseIdealLoop::loop_iv_phi` and `PhaseIdealLoop::loop_iv_stride` > > Best regards! This pull request has now been integrated. Changeset: d1b78800 Author: Beno?t Maillard Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d1b788005bdf11f1426baa8e811c121a956482c9 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod 8357951: Remove the IdealLoopTree* loop parameter from PhaseIdealLoop::loop_iv_phi Reviewed-by: thartmann, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25659 From roland at openjdk.org Fri Jun 6 09:01:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Jun 2025 09:01:46 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v11] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/69c6e50b..3b5b54a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=09-10 Stats: 39 lines in 3 files changed: 28 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Fri Jun 6 09:05:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Jun 2025 09:05:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> Message-ID: On Thu, 5 Jun 2025 12:07:01 GMT, Roberto Casta?eda Lozano wrote: > > Thanks, will run some testing and come back with the results. > > `compiler/c2/TestVerifyIterativeGVN.java` fails as follows (I tested the PR applied on top of jdk-25+25 but I see the failure also in the [GHA results](https://github.com/rwestrel/jdk/actions/runs/15438508506/job/43452592735)): > > ``` > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/rocastan/git/views/JDK-8327963-memprojs/open/src/hotspot/share/opto/node.hpp:457), pid=464370, tid=464390 > # assert(is_not_dead(n)) failed: can not use dead node > # > (...) > Current CompileTask: > C2:128 9 b 4 java.lang.reflect.ClassFileFormatVersion:: (417 bytes) > > Stack: [0x000070ba24200000,0x000070ba24300000], sp=0x000070ba242fae70, free space=1003k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x519ad1] Node::set_req(unsigned int, Node*)+0x251 (node.hpp:457) > V [libjvm.so+0x17d2d99] PhaseIterGVN::subsume_node(Node*, Node*)+0x239 (phaseX.cpp:1425) > V [libjvm.so+0x15feaf4] InitializeNode::replace_mem_projs_by(Node*, PhaseIterGVN*)+0x1d4 (phaseX.hpp:539) > V [libjvm.so+0x1532cd7] PhaseMacroExpand::expand_allocate_common(AllocateNode*, Node*, TypeFunc const*, unsigned char*, Node*)+0x167 (macro.cpp:1316) > V [libjvm.so+0x153e92e] PhaseMacroExpand::expand_macro_nodes()+0xc5e (macro.cpp:2687) > V [libjvm.so+0xb286e7] Compile::Optimize()+0xe37 (compile.cpp:2533) > (...) > ``` Thanks for the report. Should be fixed in new commit. The logic I used to remove `NarrowMemProj`s was a bit of a hack. I replaced it by code that's a bit longer but also robuster. On that topic: removal `NarrowMemProj`s happens during macro expansion. There are rare cases where C2 can't go from the `Allocate` to the `Initialize` because the IR graph doesn't match the expected pattern (some transformation introduced a region in between). In that case, `NarrowMemProj`s are not removed. That seems reasonable as there's no requirement to remove them and guaranteeing they are always removed would require a bit more code AFAICT. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2948580536 From mhaessig at openjdk.org Fri Jun 6 09:10:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 6 Jun 2025 09:10:54 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused In-Reply-To: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: <8E_ltzNOatAEe9cilqhvX2SqNPANuorX6qc-Z4MUp2M=.2cc2e4fd-3c43-4d8a-a24a-f78111d3d922@github.com> On Fri, 6 Jun 2025 07:41:45 GMT, Beno?t Maillard wrote: > This PR introduces two cleanup changes to `PhaseMacroExpand`: > > - Removes the unused field `PhaseMacroExpand::_has_locks` > - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop > > Previously, `eliminate_macro_nodes` used two separate `while` loops: > > - The first loop removed lock nodes > - The second loop removed allocation nodes > > Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. > > Thanks! Thank you for working on this! Your change looks good. However, it would be nice if you could mention in the PR description what testing you ran. Also, you forgot to update the copyright year in `opto/macro.hpp`. src/hotspot/share/opto/macro.hpp line 199: > 197: > 198: public: > 199: PhaseMacroExpand(PhaseIterGVN &igvn) : Phase(Macro_Expand), _igvn(igvn) { You forgot to update the copyright year. You can use the helper script [`make/scripts/update_copyright_year.sh`](https://github.com/openjdk/jdk/blob/master/make/scripts/update_copyright_year.sh) to do it automatically, before you send out a PR. ------------- Changes requested by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2904259441 PR Review Comment: https://git.openjdk.org/jdk/pull/25669#discussion_r2131794133 From duke at openjdk.org Fri Jun 6 09:37:30 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 6 Jun 2025 09:37:30 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v2] In-Reply-To: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: > This PR introduces two cleanup changes to `PhaseMacroExpand`: > > - Removes the unused field `PhaseMacroExpand::_has_locks` > - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop > > Previously, `eliminate_macro_nodes` used two separate `while` loops: > > - The first loop removed lock nodes > - The second loop removed allocation nodes > > Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. > > Thanks! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8356780: Update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25669/files - new: https://git.openjdk.org/jdk/pull/25669/files/4e7ee579..90c94a25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25669&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25669&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25669/head:pull/25669 PR: https://git.openjdk.org/jdk/pull/25669 From duke at openjdk.org Fri Jun 6 10:38:11 2025 From: duke at openjdk.org (erifan) Date: Fri, 6 Jun 2025 10:38:11 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request incrementally with one additional commit since the last revision: Support negating unsigned comparison for BoolTest::mask Added a static method `negate_mask(mask btm)` into BoolTest class to negate both signed and unsigned comparison. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/ebbcc405..f51bf722 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=06-07 Stats: 6 lines in 3 files changed: 2 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From duke at openjdk.org Fri Jun 6 10:38:11 2025 From: duke at openjdk.org (erifan) Date: Fri, 6 Jun 2025 10:38:11 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: <7jhSkkRnLI9jPxnO55qlmkoJa-0By2VbkUnAFsJsFD8=.1eda80fa-424f-4766-9fb6-2cf6eb061c68@github.com> References: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> <7jhSkkRnLI9jPxnO55qlmkoJa-0By2VbkUnAFsJsFD8=.1eda80fa-424f-4766-9fb6-2cf6eb061c68@github.com> Message-ID: <_YWpW68O6gzU99lre_qtNA5zkH_fCU50rcs8Va4E5eQ=.aaa944f8-5150-405a-8534-50de7c74d762@github.com> On Fri, 6 Jun 2025 07:01:58 GMT, erifan wrote: > > > Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. > > > > > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported. > > Make sense, I'll update later, thanks. @eme64 your comment is addressed, thanks for your suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2948832452 From chagedorn at openjdk.org Fri Jun 6 11:05:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 6 Jun 2025 11:05:49 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v2] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: On Fri, 6 Jun 2025 09:37:30 GMT, Beno?t Maillard wrote: >> This PR introduces two cleanup changes to `PhaseMacroExpand`: >> >> - Removes the unused field `PhaseMacroExpand::_has_locks` >> - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop >> >> Previously, `eliminate_macro_nodes` used two separate `while` loops: >> >> - The first loop removed lock nodes >> - The second loop removed allocation nodes >> >> Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8356780: Update copyright Looks good! Since you also fuse the two loops, I suggest to update the PR/JBS title accordingly. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2904593581 From roland at openjdk.org Fri Jun 6 13:51:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Jun 2025 13:51:57 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v2] In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - updated conditional propagation - Merge branch 'master' into JDK-8275202 - conditional propagation ------------- Changes: https://git.openjdk.org/jdk/pull/14586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=01 Stats: 4509 lines in 31 files changed: 4398 ins; 40 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/14586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14586/head:pull/14586 PR: https://git.openjdk.org/jdk/pull/14586 From kvn at openjdk.org Fri Jun 6 14:12:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Jun 2025 14:12:51 GMT Subject: RFR: 8342639: Global operator new in adlc has wrong exception spec In-Reply-To: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> References: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> Message-ID: On Fri, 6 Jun 2025 06:07:43 GMT, Kim Barrett wrote: > Please review this change to remove the definition of > `operator new(size_t, int, const char*, int) throw()` > > It doesn't seem to be needed anymore, if it ever really was. See discussion in > JBS for some more details. > > Testing: mach5 tier1-5, to cover various different build configurations. Good. @kimbarrett can you enable GHA testing for this branch? It will test more builds. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25668#pullrequestreview-2905050629 PR Comment: https://git.openjdk.org/jdk/pull/25668#issuecomment-2949388049 From shade at openjdk.org Fri Jun 6 14:15:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 6 Jun 2025 14:15:05 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics Message-ID: We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. Additional testing: - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` - [x] Linux x86_64 server fastdebug, `jdk_vector` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25673/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25673&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358749 Stats: 50 lines in 1 file changed: 21 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/25673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25673/head:pull/25673 PR: https://git.openjdk.org/jdk/pull/25673 From roland at openjdk.org Fri Jun 6 14:18:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Jun 2025 14:18:43 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14586/files - new: https://git.openjdk.org/jdk/pull/14586/files/b1396f1e..22091449 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=01-02 Stats: 59 lines in 15 files changed: 1 ins; 33 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/14586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14586/head:pull/14586 PR: https://git.openjdk.org/jdk/pull/14586 From roland at openjdk.org Fri Jun 6 14:18:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Jun 2025 14:18:56 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v2] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Fri, 6 Jun 2025 13:51:57 GMT, Roland Westrelin wrote: >> This change adds a new loop opts pass to optimize redundant conditions >> such as the second one in: >> >> >> if (i < 10) { >> if (i < 42) { >> >> >> In the branch of the first if, the type of i can be narrowed down to >> [min_jint, 9] which can then be used to constant fold the second >> condition. >> >> The compiler already keeps track of type[n] for every node in the >> current compilation unit. That's not sufficient to optimize the >> snippet above though because the type of i can only be narrowed in >> some sections of the control flow (that is a subset of all >> controls). The solution is to build a new table that tracks the type >> of n at every control c >> >> >> type'[n, root] = type[n] // initialized from igvn's type table >> type'[n, c] = type[n, idom(c)] >> >> >> This pass iterates over the CFG looking for conditions such as: >> >> >> if (i < 10) { >> >> >> that allows narrowing the type of i and updates the type' table >> accordingly. >> >> At a region r: >> >> >> type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) >> >> >> For a Phi phi at a region r: >> >> >> type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) >> >> >> Once a type is narrowed, uses are enqueued and their types are >> computed by calling the Value() methods. If a use's type is narrowed, >> it's recorded at c in the type' table. Value() methods retrieve types >> from the type table, not the type' table. To address that issue while >> leaving Value() methods unchanged, before calling Value() at c, the >> type table is updated so: >> >> >> type[n] = type'[n, c] >> >> >> An exception is for Phi::Value which needs to retrieve the type of >> nodes are various controls: there, a new type(Node* n, Node* c) >> method is used. >> >> For most n and c, type'[n, c] is likely the same as type[n], the type >> recorded in the global igvn table (that is there shouldn't be many >> nodes at only a few control for which we can narrow the type down). As >> a consequence, the types'[n, c] table is implemented with: >> >> - At c, narrowed down types are stored in a GrowableArray. Each entry >> records the previous type at idom(c) and the narrowed down type at >> c. >> >> - The GrowableArray of type updates is recorded in a hash table >> indexed by c. If there's no update at c, there's no entry in the >> hash table. >> >> This pass operates in 2 steps: >> >> - it first iterates over the graph looking for conditions that narrow >> the types of some nodes and propagate type updates to uses ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - updated conditional propagation > - Merge branch 'master' into JDK-8275202 > - conditional propagation I finally updated this PR. The main push back from the previous version of this change was excessive compile time. The updated change addresses this issue. In the previous version of the change, the new optimization pass was run on every pass of loop optimizations. The reason for that was issues similar to the one explained in https://github.com/openjdk/jdk/pull/23468 . Running the new pass often was a way to mitigate the issue. Since the first version of this PR, I actually found code patterns where running the pass often was not even sufficient to prevent a crash. The change from 8349479 solves all those issues and running the pass only once or a few times doesn't cause any problem. This helps compilation time quite a bit. According to my rough measurement (running `CompileTheWorld` on java.base and looking at times reported by `CITime`), the overhead of the new pass is now around 20% for `IdealLoop` (from 4.7s to 5.7s) and around 2.5% on total compilation time (1 extra second on a round 40s of total compilation time). I also refactored the code quite a bit, worked on integrating changes that are non specific to the new pass and is in other parts of the compiler so there are a lot fewer unrelated changes now, added more tests and comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2949396772 From roland at openjdk.org Fri Jun 6 14:24:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Jun 2025 14:24:55 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: <01eSe02XoUbSWslVQxaHiumE8gZhXm2jTetkHQmB91c=.2a2ec5dd-943c-4e43-a02d-9800eccd790b@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> <01eSe02XoUbSWslVQxaHiumE8gZhXm2jTetkHQmB91c=.2a2ec5dd-943c-4e43-a02d-9800eccd790b@github.com> Message-ID: On Sat, 21 Dec 2024 16:12:49 GMT, Quan Anh Mai wrote: >> Yes, I'm still working on this. I'll update this PR soon, hopefully. I reworked it quite a bit. > > Some more observations: When removing an `IfNode`, not only for `LoadNode`, you will need to pin all nodes that `depends_only_on_test` at that point (e.g. `ConstraintCast`), since the node does not only depend on the immediate dominating test (if any) but on the whole sequence of control nodes leading to that position. This can be done by e.g upgrading a `RegularDependency` `ConstraintCast` to one with `StrongDependency`. However, this action can lead to the node not able to float as freely. So I think before completing all loop opts, you should refrain from removing any `IfNode` of which the taken path has at least 1 node that `depends_only_on_test`. Pruning the untaken branch should still be done to simplify the graph. I think you have probably thought about this but just in case it can help. Thanks for the observations. The updated patch does pin a `LoadNode` when a range check it depends on is optimized out. It doesn't pin it at the control that immediately dominates the eliminated range check. Instead, it goes over the dominating controls until it finds the earliest control at which the type of the range check test constant folds and sets the `LoadNode` control to that control. So it pins it at the earliest possible control. The new constant propagation pass is also run after all loop opts by default so, nodes get pinned late in the compilation process. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14586#discussion_r2132296301 From kvn at openjdk.org Fri Jun 6 14:44:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Jun 2025 14:44:50 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v2] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: <66E-LoLtJxrFSDo75VdNoYKRjsgZPTDPeTpXUEaXLk0=.cc98d1b5-ccf7-4dfe-b328-84971cc27828@github.com> On Fri, 6 Jun 2025 09:37:30 GMT, Beno?t Maillard wrote: >> This PR introduces two cleanup changes to `PhaseMacroExpand`: >> >> - Removes the unused field `PhaseMacroExpand::_has_locks` >> - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop >> >> Previously, `eliminate_macro_nodes` used two separate `while` loops: >> >> - The first loop removed lock nodes >> - The second loop removed allocation nodes >> >> Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8356780: Update copyright They are dependent. Allocation may be referenced in locks and will not be eliminated if locks are still in graph. That is why locks are eliminated first, ------------- PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2905146374 From duke at openjdk.org Fri Jun 6 15:24:11 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 6 Jun 2025 15:24:11 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v3] In-Reply-To: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: > This PR introduces two cleanup changes to `PhaseMacroExpand`: > > - Removes the unused field `PhaseMacroExpand::_has_locks` > - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop > > Previously, `eliminate_macro_nodes` used two separate `while` loops: > > - The first loop removed lock nodes > - The second loop removed allocation nodes > > Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. > > Thanks! Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Revert "8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes" This reverts commit 13deb61de3e2a07d51e3692bb408971f6c18cecf. - Revert "8356780: Remove useless assert" This reverts commit 4e7ee57981b9019d030ac59dc697a2470d8eb5eb. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25669/files - new: https://git.openjdk.org/jdk/pull/25669/files/90c94a25..0158481e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25669&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25669&range=01-02 Stats: 27 lines in 1 file changed: 20 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25669/head:pull/25669 PR: https://git.openjdk.org/jdk/pull/25669 From mdoerr at openjdk.org Fri Jun 6 15:32:50 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 6 Jun 2025 15:32:50 GMT Subject: RFR: 8342639: Global operator new in adlc has wrong exception spec In-Reply-To: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> References: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> Message-ID: On Fri, 6 Jun 2025 06:07:43 GMT, Kim Barrett wrote: > Please review this change to remove the definition of > `operator new(size_t, int, const char*, int) throw()` > > It doesn't seem to be needed anymore, if it ever really was. See discussion in > JBS for some more details. > > Testing: mach5 tier1-5, to cover various different build configurations. LGTM. Thanks for cleaning this up! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25668#pullrequestreview-2905281270 From duke at openjdk.org Fri Jun 6 15:48:56 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 6 Jun 2025 15:48:56 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v3] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: <1V_0c8N2UAYXQegl_ztEQnFizlC3rnhg9weY6d2TAag=.ec918b2f-b250-4b98-b0e4-8a8c5b7de62c@github.com> On Fri, 6 Jun 2025 15:24:11 GMT, Beno?t Maillard wrote: >> This PR introduces two cleanup changes to `PhaseMacroExpand`: >> >> - Removes the unused field `PhaseMacroExpand::_has_locks` >> - Merges two `while` loops in `PhaseMacroExpand::eliminate_macro_nodes` into a single loop >> >> Previously, `eliminate_macro_nodes` used two separate `while` loops: >> >> - The first loop removed lock nodes >> - The second loop removed allocation nodes >> >> Both loops had the same structure and independently traversed the same set of nodes. Since their operations do not interfere, the lock node removal logic was moved into the second loop as an additional case in the `switch` statement. >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes" > > This reverts commit 13deb61de3e2a07d51e3692bb408971f6c18cecf. > - Revert "8356780: Remove useless assert" > > This reverts commit 4e7ee57981b9019d030ac59dc697a2470d8eb5eb. I have reverted the changes related to the fusing of the two loops, and have created a new issue to investigate this separately: [JDK-8358788](https://bugs.openjdk.org/browse/JDK-8358788) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25669#issuecomment-2949688261 From mchevalier at openjdk.org Fri Jun 6 16:56:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 6 Jun 2025 16:56:52 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v3] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: On Fri, 6 Jun 2025 15:24:11 GMT, Beno?t Maillard wrote: >> This PR removes the unused field `PhaseMacroExpand::_has_locks` >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes" > > This reverts commit 13deb61de3e2a07d51e3692bb408971f6c18cecf. > - Revert "8356780: Remove useless assert" > > This reverts commit 4e7ee57981b9019d030ac59dc697a2470d8eb5eb. Seems safe and it does what it says. Interestingly, the cases are now here only for an assert. I guess that's still good to have and it's pretty harmless: in product, the assert won't exist and it will just be an immediate break. ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2905502236 From sviswanathan at openjdk.org Fri Jun 6 16:57:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Jun 2025 16:57:52 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v8] In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 07:46:32 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resoltions Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25501#pullrequestreview-2905507817 From kvn at openjdk.org Fri Jun 6 17:54:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Jun 2025 17:54:53 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v3] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: On Fri, 6 Jun 2025 15:24:11 GMT, Beno?t Maillard wrote: >> This PR removes the unused field `PhaseMacroExpand::_has_locks` >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes" > > This reverts commit 13deb61de3e2a07d51e3692bb408971f6c18cecf. > - Revert "8356780: Remove useless assert" > > This reverts commit 4e7ee57981b9019d030ac59dc697a2470d8eb5eb. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2905635855 From chagedorn at openjdk.org Fri Jun 6 18:02:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 6 Jun 2025 18:02:51 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v3] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: On Fri, 6 Jun 2025 15:24:11 GMT, Beno?t Maillard wrote: >> This PR removes the unused field `PhaseMacroExpand::_has_locks` >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes" > > This reverts commit 13deb61de3e2a07d51e3692bb408971f6c18cecf. > - Revert "8356780: Remove useless assert" > > This reverts commit 4e7ee57981b9019d030ac59dc697a2470d8eb5eb. Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2905650667 From sviswanathan at openjdk.org Fri Jun 6 18:31:50 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Jun 2025 18:31:50 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> Message-ID: On Wed, 4 Jun 2025 06:35:50 GMT, Emanuel Peter wrote: >>> @jatin-bhateja Thanks for looking into this! >>> >>> `predicate(!UseAPX && n->in(2)->bottom_type()->is_int()->get_con() != -1);` >>> >>> The PR title seems to suggest the bug is only about -XX:+UseAPX. Why are you changing things for the case !UseAPX? >>> >>> Are these not cases like a ^ -1, which basically flips all bits. What alternative does this end up using now? >>> >>> A code comment would be helpful. >> >> We are tightening the predicate check so that under no circumstances we pick this pattern during the reduction phase of instruction selection on account of having lower cost. There is a generic pattern (xorI_rReg_imm) for all integral immediate values, and then there is a special pattern for Xor with -1 (fxorI_rReg_im1), which is needed for AndN inferencing. > > @jatin-bhateja I'll wait with testing, until someone from Intel gives this the approval. Feel free to ping me for that once we are there :) @eme64 This PR is now ready for your testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2950110042 From vlivanov at openjdk.org Fri Jun 6 21:41:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 6 Jun 2025 21:41:49 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 14:09:11 GMT, Aleksey Shipilev wrote: > We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. > > It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` > - [x] Linux x86_64 server fastdebug, `jdk_vector` Looks good. Thanks for taking care of upstreaming it. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25673#pullrequestreview-2906212581 From sviswanathan at openjdk.org Fri Jun 6 22:49:51 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Jun 2025 22:49:51 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 14:09:11 GMT, Aleksey Shipilev wrote: > We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. > > It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` > - [x] Linux x86_64 server fastdebug, `jdk_vector` Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25673#pullrequestreview-2906334077 From duke at openjdk.org Fri Jun 6 23:14:50 2025 From: duke at openjdk.org (ExE Boss) Date: Fri, 6 Jun 2025 23:14:50 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 14:09:11 GMT, Aleksey Shipilev wrote: > We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. > > It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` > - [x] Linux x86_64 server fastdebug, `jdk_vector` Also?note that?the?implementation of?`Utils.isNonCapturingLambda(?)` is?wrong?when the?`jdk.internal.lambda.disableEagerInitialization` system?property is?set to?`"true"`, as?that?causes lambda?classes to?have one?`static?final`?field: https://github.com/openjdk/jdk/blob/d7352559195b9e052c3eb24d773c0d6c10dc23ad/src/java.base/share/classes/jdk/internal/vm/vector/Utils.java#L36-L38 https://github.com/openjdk/jdk/blob/d7352559195b9e052c3eb24d773c0d6c10dc23ad/src/java.base/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java#L365-L372 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25673#issuecomment-2951174116 From kbarrett at openjdk.org Sat Jun 7 20:36:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Jun 2025 20:36:53 GMT Subject: RFR: 8342639: Global operator new in adlc has wrong exception spec In-Reply-To: References: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> Message-ID: On Fri, 6 Jun 2025 14:09:58 GMT, Vladimir Kozlov wrote: >> Please review this change to remove the definition of >> `operator new(size_t, int, const char*, int) throw()` >> >> It doesn't seem to be needed anymore, if it ever really was. See discussion in >> JBS for some more details. >> >> Testing: mach5 tier1-5, to cover various different build configurations. > > @kimbarrett can you enable GHA testing for this branch? It will test more builds. Thanks for reviews @vnkozlov and @TheRealMDoerr ------------- PR Comment: https://git.openjdk.org/jdk/pull/25668#issuecomment-2952987892 From kbarrett at openjdk.org Sat Jun 7 20:36:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Jun 2025 20:36:54 GMT Subject: Integrated: 8342639: Global operator new in adlc has wrong exception spec In-Reply-To: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> References: <2-5wm21LtxEx0HMSd6gBbLv6tNLSBO4rGQ2W-uHfX6Q=.489a263e-fb34-4ca0-bc3f-5c74027367d6@github.com> Message-ID: On Fri, 6 Jun 2025 06:07:43 GMT, Kim Barrett wrote: > Please review this change to remove the definition of > `operator new(size_t, int, const char*, int) throw()` > > It doesn't seem to be needed anymore, if it ever really was. See discussion in > JBS for some more details. > > Testing: mach5 tier1-5, to cover various different build configurations. This pull request has now been integrated. Changeset: e94ad551 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/e94ad551c6d31b91ec066f92f9bbdb956f54e887 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod 8342639: Global operator new in adlc has wrong exception spec Reviewed-by: kvn, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/25668 From dskantz at openjdk.org Mon Jun 9 06:13:53 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 9 Jun 2025 06:13:53 GMT Subject: RFR: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations [v2] In-Reply-To: References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: On Thu, 5 Jun 2025 08:51:11 GMT, Daniel Skantz wrote: >> This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. >> >> Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. >> >> Testing: T1-4. >> >> Extra testing: ran the tests manually with `-XX:+PrintOptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - revert change to TestStringIntrinsics.java > - Update test/hotspot/jtreg/compiler/c2/Test7046096.java > > Co-authored-by: Emanuel Peter Thanks for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25610#issuecomment-2954752738 From dskantz at openjdk.org Mon Jun 9 06:13:54 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 9 Jun 2025 06:13:54 GMT Subject: Integrated: 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations In-Reply-To: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> References: <4GDLAMfeWjgfcGvn4sUSMT2jjG3vsebjcFeJqgHqPQw=.e7dfa9e7-4608-4304-ba00-0b254b6bf2b1@github.com> Message-ID: <7yYO9VwG5PBK65x_YDmStJUMgFCdcp9RT4N5ySXN3dw=.91b5288d-adb0-45fe-8b37-e0fe8cf03f72@github.com> On Tue, 3 Jun 2025 07:17:47 GMT, Daniel Skantz wrote: > This PR updates a few tests to reintroduce testing of string concatenation optimizations since a few bugs have recently been identified in this area. > > Selection criteria: performed a text search on the test suite and identified tests for string concatenations or string optimizations that are not currently compiled with `-XDstringConcat=inline` and are not using StringBuilders explicitly. > > Testing: T1-4. > > Extra testing: ran the tests manually with `-XX:+PrintOptimizeStringConcat` and verified that the tests are exercising string optimizations after the fix. This pull request has now been integrated. Changeset: 6c616c71 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/6c616c71ec9a8ee6e0203921deef20d09db39698 Stats: 86 lines in 6 files changed: 81 ins; 0 del; 5 mod 8357822: C2: Multiple string optimization tests are no longer testing string concatenation optimizations Reviewed-by: rcastanedalo, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25610 From rcastanedalo at openjdk.org Mon Jun 9 06:26:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 06:26:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 27 May 2025 07:46:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Include address mode test in 'legitimize_address' > - Excluded IR checks for testLoadVolatile on PPC64 Thanks everyone for reviewing and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2954772006 From rcastanedalo at openjdk.org Mon Jun 9 06:26:03 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 06:26:03 GMT Subject: Integrated: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). This pull request has now been integrated. Changeset: 91f12600 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/91f12600d2b188ca98c5c575a34b85f5835399a0 Stats: 401 lines in 16 files changed: 347 ins; 38 del; 16 mod 8345067: C2: enable implicit null checks for ZGC reads Reviewed-by: aboldtch, kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Mon Jun 9 06:58:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 06:58:52 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> Message-ID: <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> On Fri, 6 Jun 2025 09:02:53 GMT, Roland Westrelin wrote: > Thanks for the report. Should be fixed in new commit. Thanks, the new commit (3b5b54a3) passes Oracle's tier1-tier5 testing. > There are rare cases where C2 can't go from the `Allocate` to the `Initialize` because the IR graph doesn't match the expected pattern (some transformation introduced a region in between). In that case, `NarrowMemProj`s are not removed. That seems reasonable as there's no requirement to remove them and guaranteeing they are always removed would require a bit more code AFAICT. I think it would be good (although not necessarily in the context of this PR) to establish the "no duplicate memory projection" invariant in the back-end, for sanity and to make sure we do not break any logic that might be implicitly relying on it. If you agree, could you file a follow-up RFE, ideally with a reproducer where the current logic fails to remove `NarrowMemProj`s? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2954854021 From rcastanedalo at openjdk.org Mon Jun 9 07:06:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 07:06:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v11] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 6 Jun 2025 09:01:46 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more Since the changeset may introduce a significant amount of C2 nodes (especially for objects with many fields), I also evaluated the effect of this changeset on C2 speed (using DaCapo 23.11-chopin on x64 and aarch64). The result is neutral, I guess because 1) allocation C2 patterns are relatively infrequent and 2) the C2 back-end is unaffected (except for the rare pattern you discuss above where redundant memory projections are not removed, but even then register allocation is unaffected). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2954877964 From rcastanedalo at openjdk.org Mon Jun 9 07:10:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 07:10:54 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Fri, 6 Jun 2025 14:18:43 GMT, Roland Westrelin wrote: >> This change adds a new loop opts pass to optimize redundant conditions >> such as the second one in: >> >> >> if (i < 10) { >> if (i < 42) { >> >> >> In the branch of the first if, the type of i can be narrowed down to >> [min_jint, 9] which can then be used to constant fold the second >> condition. >> >> The compiler already keeps track of type[n] for every node in the >> current compilation unit. That's not sufficient to optimize the >> snippet above though because the type of i can only be narrowed in >> some sections of the control flow (that is a subset of all >> controls). The solution is to build a new table that tracks the type >> of n at every control c >> >> >> type'[n, root] = type[n] // initialized from igvn's type table >> type'[n, c] = type[n, idom(c)] >> >> >> This pass iterates over the CFG looking for conditions such as: >> >> >> if (i < 10) { >> >> >> that allows narrowing the type of i and updates the type' table >> accordingly. >> >> At a region r: >> >> >> type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) >> >> >> For a Phi phi at a region r: >> >> >> type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) >> >> >> Once a type is narrowed, uses are enqueued and their types are >> computed by calling the Value() methods. If a use's type is narrowed, >> it's recorded at c in the type' table. Value() methods retrieve types >> from the type table, not the type' table. To address that issue while >> leaving Value() methods unchanged, before calling Value() at c, the >> type table is updated so: >> >> >> type[n] = type'[n, c] >> >> >> An exception is for Phi::Value which needs to retrieve the type of >> nodes are various controls: there, a new type(Node* n, Node* c) >> method is used. >> >> For most n and c, type'[n, c] is likely the same as type[n], the type >> recorded in the global igvn table (that is there shouldn't be many >> nodes at only a few control for which we can narrow the type down). As >> a consequence, the types'[n, c] table is implemented with: >> >> - At c, narrowed down types are stored in a GrowableArray. Each entry >> records the previous type at idom(c) and the narrowed down type at >> c. >> >> - The GrowableArray of type updates is recorded in a hash table >> indexed by c. If there's no update at c, there's no entry in the >> hash table. >> >> This pass operates in 2 steps: >> >> - it first iterates over the graph looking for conditions that narrow >> the types of some nodes and propagate type updates to uses ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more src/hotspot/share/opto/loopConditionalPropagation.cpp line 1416: > 1414: if (use->in(j) == node && !r->in(j)->is_top() && _conditional_propagation.is_dominator(c, r->in(j)) && > 1415: is_safe_for_replacement_at_phi(node, use, r, j)) { > 1416: if (con == NULL) { Suggestion: if (con == nullptr) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14586#discussion_r2135168548 From kwei at openjdk.org Mon Jun 9 07:18:53 2025 From: kwei at openjdk.org (Kuai Wei) Date: Mon, 9 Jun 2025 07:18:53 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Wed, 4 Jun 2025 12:44:10 GMT, Emanuel Peter wrote: >> Hi @eme64 , I tried to use match pattern for `MergePrimitiveLoads::has_no_merge_load_combine_below()` . But I think it has some difficulty. For mergeable operators, they can be linked in different way, like: >> 1) (((item1 Or item2) Or item3) Or item4) >> 2) ((item1 Or item2) Or (item3 Or item4)) >> ... >> To check the next `Or` operator is a valid last one of combine operator chain. We may check its all input recursively. I didn't find a good way to revolve it. If you have better idea, I will check it. >> >> I think it's more easy to mark the combine operator checked. It works in this way: >> * If the checking combine operator has successor combine operator , which is not checked before, we do not optimize it and let the next one has chance to be optimized. >> * If we try to merge but failed, so we mark it as a `checked` and add its input into GVN worklist. So its input operators can be checked. >> >> I added comments of MergePrimitiveLoads::has_no_merge_load_combine_below() to describe the design. >> >> To reduce the memory size of `AddNode`. I removed the flag from `AddNode` and add 2 virtual fucntions >> ```c++ >> // Check if this node is checked by merge_memops phase >> virtual bool is_merge_memops_checked() const { return false; }; >> virtual void set_merge_memops_checked(bool v) { ShouldNotReachHere(); }; >> >> The flag , `_merge_memops_checked`, is only added in OrINode and OrLNode. >> >> Could you help to check the design and code? >> >> Thanks. > > @kuaiwei Thanks for your reply! > >> I think it's more easy to mark the combine operator checked. > > It may seem easier now. But over time, if multiple operations had such flags, things would become very messy. And now every node that can be such a `combine operator` has to have an additional flag, and consumes more memory. > >> I tried to use match pattern for MergePrimitiveLoads::has_no_merge_load_combine_below() . But I think it has some difficulty. For mergeable operators, they can be linked in different way, like: >> (((item1 Or item2) Or item3) Or item4) >> ((item1 Or item2) Or (item3 Or item4)) >> ... > > Yes, we may have to deal with inputs being permuted. But I think we should be able to deal with the permutations, we do that in other places too. > >> To check the next Or operator is a valid last one of combine operator chain. We may check its all input recursively. I didn't find a good way to revolve it. If you have better idea, I will check it. > > I'm not sure I understood what you said here. > >> We may check its all input recursively > > You probably mean we could check all outputs? > > So if you are looking at the `OrINode`, and the pattern above it is already a `MergeLoad` pattern, then we should also look down, and see if we find other `OrINode`. For each of these output nodes, we should check if their other input could also be merged with what we already have. Do you not think this is possible? What exactly makes it difficult or impossible? @eme64 From your description, I may change like below. Could you check if I understand correct? Thanks. When IGVN check the input combine operator, called it as `_checked`. We can go down the combine operators chain to find the _last one. for op from _last to _checked: // _checked is not include collect merge_info_list by op if it can be merged and _checked is in the list: return // it will be merged when IGVN optimize this op if can not merge or _checked is not in list: continue; // all successors of combine operators are checked, we can start to merge with _checked ... I think it can work but there are some redundant `collect and check` work. And we can add a cache in IGVN to reduce it. Do you have other suggestion about it ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2954901892 From rcastanedalo at openjdk.org Mon Jun 9 07:25:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 07:25:59 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Fri, 6 Jun 2025 14:18:43 GMT, Roland Westrelin wrote: >> This change adds a new loop opts pass to optimize redundant conditions >> such as the second one in: >> >> >> if (i < 10) { >> if (i < 42) { >> >> >> In the branch of the first if, the type of i can be narrowed down to >> [min_jint, 9] which can then be used to constant fold the second >> condition. >> >> The compiler already keeps track of type[n] for every node in the >> current compilation unit. That's not sufficient to optimize the >> snippet above though because the type of i can only be narrowed in >> some sections of the control flow (that is a subset of all >> controls). The solution is to build a new table that tracks the type >> of n at every control c >> >> >> type'[n, root] = type[n] // initialized from igvn's type table >> type'[n, c] = type[n, idom(c)] >> >> >> This pass iterates over the CFG looking for conditions such as: >> >> >> if (i < 10) { >> >> >> that allows narrowing the type of i and updates the type' table >> accordingly. >> >> At a region r: >> >> >> type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) >> >> >> For a Phi phi at a region r: >> >> >> type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) >> >> >> Once a type is narrowed, uses are enqueued and their types are >> computed by calling the Value() methods. If a use's type is narrowed, >> it's recorded at c in the type' table. Value() methods retrieve types >> from the type table, not the type' table. To address that issue while >> leaving Value() methods unchanged, before calling Value() at c, the >> type table is updated so: >> >> >> type[n] = type'[n, c] >> >> >> An exception is for Phi::Value which needs to retrieve the type of >> nodes are various controls: there, a new type(Node* n, Node* c) >> method is used. >> >> For most n and c, type'[n, c] is likely the same as type[n], the type >> recorded in the global igvn table (that is there shouldn't be many >> nodes at only a few control for which we can narrow the type down). As >> a consequence, the types'[n, c] table is implemented with: >> >> - At c, narrowed down types are stored in a GrowableArray. Each entry >> records the previous type at idom(c) and the narrowed down type at >> c. >> >> - The GrowableArray of type updates is recorded in a hash table >> indexed by c. If there's no update at c, there's no entry in the >> hash table. >> >> This pass operates in 2 steps: >> >> - it first iterates over the graph looking for conditions that narrow >> the types of some nodes and propagate type updates to uses ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more > According to my rough measurement (...), the overhead of the new pass is now (...) around 2.5% on total compilation time Thanks for working on this. This matches the results I see on DaCapo 23.11-chopin on x64 (mean 3% slowdown). The results seem a bit worse on aarch64 (Apple M1), there I see a mean slowdown of around 5% (from 3% for `jython` up to 10% for `eclipse`). Still a great improvement compared to the initial version! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2954919241 From rcastanedalo at openjdk.org Mon Jun 9 07:37:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Jun 2025 07:37:54 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Fri, 6 Jun 2025 14:18:43 GMT, Roland Westrelin wrote: >> This change adds a new loop opts pass to optimize redundant conditions >> such as the second one in: >> >> >> if (i < 10) { >> if (i < 42) { >> >> >> In the branch of the first if, the type of i can be narrowed down to >> [min_jint, 9] which can then be used to constant fold the second >> condition. >> >> The compiler already keeps track of type[n] for every node in the >> current compilation unit. That's not sufficient to optimize the >> snippet above though because the type of i can only be narrowed in >> some sections of the control flow (that is a subset of all >> controls). The solution is to build a new table that tracks the type >> of n at every control c >> >> >> type'[n, root] = type[n] // initialized from igvn's type table >> type'[n, c] = type[n, idom(c)] >> >> >> This pass iterates over the CFG looking for conditions such as: >> >> >> if (i < 10) { >> >> >> that allows narrowing the type of i and updates the type' table >> accordingly. >> >> At a region r: >> >> >> type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) >> >> >> For a Phi phi at a region r: >> >> >> type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) >> >> >> Once a type is narrowed, uses are enqueued and their types are >> computed by calling the Value() methods. If a use's type is narrowed, >> it's recorded at c in the type' table. Value() methods retrieve types >> from the type table, not the type' table. To address that issue while >> leaving Value() methods unchanged, before calling Value() at c, the >> type table is updated so: >> >> >> type[n] = type'[n, c] >> >> >> An exception is for Phi::Value which needs to retrieve the type of >> nodes are various controls: there, a new type(Node* n, Node* c) >> method is used. >> >> For most n and c, type'[n, c] is likely the same as type[n], the type >> recorded in the global igvn table (that is there shouldn't be many >> nodes at only a few control for which we can narrow the type down). As >> a consequence, the types'[n, c] table is implemented with: >> >> - At c, narrowed down types are stored in a GrowableArray. Each entry >> records the previous type at idom(c) and the narrowed down type at >> c. >> >> - The GrowableArray of type updates is recorded in a hash table >> indexed by c. If there's no update at c, there's no entry in the >> hash table. >> >> This pass operates in 2 steps: >> >> - it first iterates over the graph looking for conditions that narrow >> the types of some nodes and propagate type updates to uses ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more I tested this changeset applied on top of jdk-25+26 (Oracle CI tier1-5) and found the following issues (besides the trivial `NULL` occurrence reported above): - `assert(c->_idx >= _unique || _type_table->find_type_between(c, c, _phase->C->root()) != Type::TOP) failed: for If we don't follow dead projections` in multiple tests, e.g. `compiler/predicates/TestHoistedPredicateForNonRangeCheck.java` and `compiler/predicates/assertion/TestOpaqueInitializedAssertionPredicateNode.java`. - `assert(!failure) failed: Missed optimization opportunity in PhaseIterGVN` in multiple tests, e.g. `compiler/predicates/TestHoistedPredicateForNonRangeCheck.java` and `compiler/codegen/aes/TestCipherBlockChainingEncrypt.java`. - `There were one or multiple IR rule failures` in `compiler/loopconditionalpropagation/TestLoopConditionalPropagation.java`. Let me know if you need more details to reproduce these issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2954944552 From duke at openjdk.org Mon Jun 9 09:58:53 2025 From: duke at openjdk.org (erifan) Date: Mon, 9 Jun 2025 09:58:53 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: <6FGYIrxqZhFZn9Ycw_VOw7gltjocfG451Ex_Qq9yjhM=.c8183515-a047-4160-be9b-98200040c19f@github.com> On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. Hi @RealFYang , would you mind helping test this patch on a risc-v machine with rvv? This optimization also applies to rvv, but I don't have a risc-v test environment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2955291395 From mli at openjdk.org Mon Jun 9 16:36:28 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 9 Jun 2025 16:36:28 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 Message-ID: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Hi, Can you help to review this patch? Thanks! Currently, this issue is only reproducible with Dacapo sunflow. I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. So, currently I can only verify the code by reviewing it. Or maybe it's better to leave it until we find the test? ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/25696/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358892 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From qamai at openjdk.org Mon Jun 9 17:29:36 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Jun 2025 17:29:36 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v65] In-Reply-To: References: Message-ID: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: - Merge branch 'master' into unsignedbounds - add more intn_t tests - Emanuel's reviews - alignment wording - refinement - refine the cases where there does not exist a result - add some more sanity static_asserts - intn_t refinements - Emanuel's reviews - Emanuel's reviews - ... and 73 more: https://git.openjdk.org/jdk/compare/cae1fd33...ad6ac7cb ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=64 Stats: 2570 lines in 13 files changed: 2007 ins; 331 del; 232 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Mon Jun 9 17:29:36 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Jun 2025 17:29:36 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 15:31:53 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add more intn_t tests > > This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked. > And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress. @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2956443956 From kvn at openjdk.org Mon Jun 9 18:11:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 9 Jun 2025 18:11:16 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 15:31:53 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add more intn_t tests > > This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked. > And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress. > @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please? I submitted testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2956551222 From fyang at openjdk.org Mon Jun 9 22:23:27 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 9 Jun 2025 22:23:27 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Mon, 9 Jun 2025 16:27:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Try this: public class Test { public static int[] nl = { 0, 0, 0 }; public static int[] nr = { 1, 1, 1 }; public static float[] nodeMin = { 0.0f, 0.0f, 0.0f }; public static float[] nodeMax = { 0.1f, 0.1f, 0.1f }; // for case BoolTest::ge public static int test1(int axis, int pPlanar, float dl, float dr) { boolean planarLeft = !(dl <= dr); int numLeft = nl[axis] + (planarLeft ? pPlanar : 0); int numRight = nr[axis] + (planarLeft ? 0 : pPlanar); return numLeft + numRight; } // for case BoolTest::gt public static int test2(int axis, int pPlanar, float dl, float dr) { boolean planarLeft = !(dl < dr); int numLeft = nl[axis] + (planarLeft ? pPlanar : 0); int numRight = nr[axis] + (planarLeft ? 0 : pPlanar); return numLeft + numRight; } public static void main(String[] args) { int ret = 0; // test case BoolTest::ge for (int i = 0; i < 20000; i++) { if (i % 2 == 0) { ret = test1(i % 3, i % 10, nodeMin[i % 3], nodeMax[i % 3]); } else { ret = test1(i % 3, i % 10, nodeMax[i % 3], nodeMin[i % 3]); } } System.out.println("test1 passed. result = " + ret); // test case BoolTest::gt for (int i = 0; i < 20000; i++) { if (i % 2 == 0) { ret = test2(i % 3, i % 10, nodeMin[i % 3], nodeMax[i % 3]); } else { ret = test2(i % 3, i % 10, nodeMax[i % 3], nodeMin[i % 3]); } } System.out.println("test2 passed. result = " + ret); } } $ java -XX:-TieredCompilation Test This reduced test can reproduce the crash and cover both `BoolTest::gt` and `BoolTest::ge` cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2957167157 From cslucas at openjdk.org Tue Jun 10 00:03:51 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 10 Jun 2025 00:03:51 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client Message-ID: We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 ). Tested on Linux x86_64, ARM with JTREG tier 1-3. ------------- Commit messages: - Set reason why InstalledCode was changed/installed/changed. Changes: https://git.openjdk.org/jdk/pull/25706/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359064 Stats: 143 lines in 18 files changed: 121 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From kvn at openjdk.org Tue Jun 10 00:45:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Jun 2025 00:45:52 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v65] In-Reply-To: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> References: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> Message-ID: On Mon, 9 Jun 2025 17:29:36 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: > > - Merge branch 'master' into unsignedbounds > - add more intn_t tests > - Emanuel's reviews > - alignment wording > - refinement > - refine the cases where there does not exist a result > - add some more sanity static_asserts > - intn_t refinements > - Emanuel's reviews > - Emanuel's reviews > - ... and 73 more: https://git.openjdk.org/jdk/compare/cae1fd33...ad6ac7cb My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2911687054 From jbhateja at openjdk.org Tue Jun 10 02:23:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Jun 2025 02:23:52 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Mon, 9 Jun 2025 17:25:47 GMT, Quan Anh Mai wrote: >> This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked. >> And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress. > > @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please? Hi @merykitty, TypeIntPrototype currently accepts any random unsigned and signed value ranges and then tries to canonicalize them. This is a flexible approach, but do you see any practical use case for this relaxation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2957497599 From fyang at openjdk.org Tue Jun 10 02:43:34 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 10 Jun 2025 02:43:34 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. FYI: I submitted to testing in QEMU-system / Ubuntu 25.04 (fastdebug jdk build and 256-bit RVV) and I see `compiler/vectorization`, `compiler/vectorapi` and `jdk_vector` tests are passing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2957518418 From duke at openjdk.org Tue Jun 10 03:18:29 2025 From: duke at openjdk.org (erifan) Date: Tue, 10 Jun 2025 03:18:29 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: <45vF6n57b-4myKJ0sMUgkcMqeXfveOsRfx3oDhmAMvM=.4be42b00-297a-4b89-af87-49bf86369828@github.com> On Tue, 10 Jun 2025 02:38:29 GMT, Fei Yang wrote: > FYI: I submitted to testing in QEMU-system / Ubuntu 25.04 (fastdebug jdk build and 256-bit RVV) and I see `compiler/vectorization`, `compiler/vectorapi` and `jdk_vector` tests are passing. Thank you very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2957560924 From jbhateja at openjdk.org Tue Jun 10 04:03:49 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Jun 2025 04:03:49 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Mon, 9 Jun 2025 17:25:47 GMT, Quan Anh Mai wrote: >> This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked. >> And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress. > > @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please? @merykitty , your comments of following understand will be helpful Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values? A. Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT, and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD. Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations. Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization, it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync. During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization, in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point, signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this. Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible. The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong i.e. unsigned bounds, signed bounds or known bits may change. In practice only known bits (bit level df) and signed bounds may be usable a flag with signify a bound is unsigned may suffice, associating 3 lattice points with TypeInt is on account of flexibility, it may facilitate injecting opaque IR with unsigned bounds in SoN by optimization passes and then let the type canonicalization and iterative data flow analysis do the magic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2957609331 From amitkumar at openjdk.org Tue Jun 10 04:51:39 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 10 Jun 2025 04:51:39 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 Message-ID: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. ------------- Commit messages: - make jvm exit gracefully Changes: https://git.openjdk.org/jdk/pull/25708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358694 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From jkarthikeyan at openjdk.org Tue Jun 10 04:56:34 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Jun 2025 04:56:34 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 05:43:28 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2519: >> >>> 2517: >>> 2518: // Default to disallowing vector truncation >>> 2519: return false; >> >> I was wondering: >> We could have an assert here that lists all operations that cannot be truncated? >> So if a new operation is added, then we will catch that it is not handled here yet, and we can add tests, and either allow it to truncate, or add it to the list of non-truncatable operations. > >> Earlier in the function, this logic is guarded with vtn->basic_type() == T_INT so I think only integer nodes need to be added to the list. Let me know what you think! > > That sounds reasonable. And that would mean we only have to add `int` operation to that assert. And if anybody ever relaxes that `vtn->basic_type() == T_INT` check, then they would immediately run into that assert. Would be nice. I think this is an interesting point. My concern is that this might cause unexpected assert failures to occur in rare cases, which could create a large bug tail. The behavior in the non-truncation case should not cause miscompiles so I'm not sure that a general assert is warranted. The one outlying case is that unexpected `Type` nodes can cause the later `vt = container_type(in);` line to produce an invalid result since it uses the bottom type, so I think an assert for that specifically could be useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2136906642 From jkarthikeyan at openjdk.org Tue Jun 10 05:03:29 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Jun 2025 05:03:29 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 05:41:30 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Reformat, add comments and char tests > > src/hotspot/share/opto/superword.cpp line 2579: > >> 2577: Node* load = in->in(1); >> 2578: // For certain operations such as shifts and abs(), use the size of the load if it exists >> 2579: if ((VectorNode::is_shift_opcode(op) || op == Op_AbsI) && load->is_Load() && > > Can you say a little more about this? What about `Op_ReverseBytesI`, did that not previously also get through here? In this case I believe abs/shifts were added because they needed to differentiate between signed and unsigned loads when dealing with truncation (abs was added in [JDK-8261022](https://bugs.openjdk.org/browse/JDK-8261022)) but the other branch that promotes to int is used by all the other nodes. This logic is pretty confusing because it's doing 2 separate things in the same code path. From my understanding, `Op_ReverseBytesI` requires the promotion to int behavior because the issue lies with truncation. Though, I think adding some more tests here would be a good idea so I'll look into that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2136912746 From epeter at openjdk.org Tue Jun 10 05:22:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 05:22:28 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 04:54:15 GMT, Jasmine Karthikeyan wrote: > My concern is that this might cause unexpected assert failures to occur in rare cases, which could create a large bug tail. If it is only an assert, and you return the "conservative" answer in product, then it only affects debug. This means we don't have to be too worried about it when the assert fails. And it gives us a chance to look at that new node, and decide if truncation is ok or not. A good comment at the site of the assert would make fixing those assert bugs quite easy. It would be an assert that ensures we get better performance because we vectorize. I think this is easier than having to write IR tests for all possible nodes with all possible truncations, that would be the alternative I suppose. But with IR tests we might always miss an operation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2136931199 From epeter at openjdk.org Tue Jun 10 05:22:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 05:22:29 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: <48S__F0V_kgGEToz-TPJsnx316NlNpfP5FdRBhcEk6c=.5aae3abd-ff23-45d5-9904-e181aea72808@github.com> On Tue, 10 Jun 2025 05:00:36 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/superword.cpp line 2579: >> >>> 2577: Node* load = in->in(1); >>> 2578: // For certain operations such as shifts and abs(), use the size of the load if it exists >>> 2579: if ((VectorNode::is_shift_opcode(op) || op == Op_AbsI) && load->is_Load() && >> >> Can you say a little more about this? What about `Op_ReverseBytesI`, did that not previously also get through here? > > In this case I believe abs/shifts were added because they needed to differentiate between signed and unsigned loads when dealing with truncation (abs was added in [JDK-8261022](https://bugs.openjdk.org/browse/JDK-8261022)) but the other branch that promotes to int is used by all the other nodes. This logic is pretty confusing because it's doing 2 separate things in the same code path. From my understanding, `Op_ReverseBytesI` requires the promotion to int behavior because the issue lies with truncation. Though, I think adding some more tests here would be a good idea so I'll look into that. Thanks :) If it is indeed separate logic, you could also consider a separate refactoring and adding those tests there. But I leave it up to you, maybe it is simple enough to do it all here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2136932600 From xgong at openjdk.org Tue Jun 10 05:35:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 10 Jun 2025 05:35:37 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: <_A-2ocgC_0O7bLT01O1ak4Mj19C8rFFTeuTUnDuEalo=.8cabdfda-c875-46e7-8860-31090994d2c5@github.com> On Wed, 4 Jun 2025 09:14:12 GMT, Xiaohong Gong wrote: > > @XiaohongGong Let's please delay this until after Thursday, so that this does not go into JDK25 yet, and we have more time to fix it if something goes wrong down the line. > > Sure. That makes sense to me. Thanks! BTW, I'v updated the test according to your comment. So could you please help run all the tests? Thanks again! Hi @eme64 , may I ask what the status of the testing is for this PR? Any feedback please let me know. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2957732129 From qamai at openjdk.org Tue Jun 10 05:37:49 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Jun 2025 05:37:49 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 04:00:49 GMT, Jatin Bhateja wrote: >> @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please? > > @merykitty , your comments of following understand will be helpful > > Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values? > > A. Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT, and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD. > > Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations. Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization, it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync. > > During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization, in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point, signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this. > > Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible. The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong i.e. unsigned bounds, signed bounds or known bits may change. In practice only known bits (bit level df) and signed bounds may be usable a ... @jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below: In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned. I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2957736235 From shade at openjdk.org Tue Jun 10 06:17:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 06:17:34 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 23:12:07 GMT, ExE Boss wrote: >> We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. >> >> It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` >> - [x] Linux x86_64 server fastdebug, `jdk_vector` > > Also?note that?the?implementation of?`Utils.isNonCapturingLambda(?)` is?wrong?when the?`jdk.internal.lambda.disableEagerInitialization` system?property is?set to?`"true"`, as?that?causes lambda?classes to?have one?`static?final`?field: > https://github.com/openjdk/jdk/blob/d7352559195b9e052c3eb24d773c0d6c10dc23ad/src/java.base/share/classes/jdk/internal/vm/vector/Utils.java#L36-L38 > https://github.com/openjdk/jdk/blob/d7352559195b9e052c3eb24d773c0d6c10dc23ad/src/java.base/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java#L365-L372 @ExE-Boss, I believe that comment should be in another bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25673#issuecomment-2957803460 From shade at openjdk.org Tue Jun 10 06:17:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 06:17:34 GMT Subject: RFR: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 14:09:11 GMT, Aleksey Shipilev wrote: > We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. > > It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` > - [x] Linux x86_64 server fastdebug, `jdk_vector` Thanks for reviews! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25673#issuecomment-2957803881 From shade at openjdk.org Tue Jun 10 06:17:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 06:17:35 GMT Subject: Integrated: 8358749: Fix input checks in Vector API intrinsics In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 14:09:11 GMT, Aleksey Shipilev wrote: > We have been carrying this patch in Leyden/premain for a while: https://github.com/openjdk/leyden/commit/7faed7fc5c8e1bbd9a16ab22673a77099396179c. I believe it deserves to be in mainline. I polished it a little further. > > It is _mostly_ a cleanup, but there are also new checks, on the paths where we do take constants off the arguments. In those cases, I believe the alternative is compiler SEGV-ing. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_vector_1 hotspot_vector_2` > - [x] Linux x86_64 server fastdebug, `jdk_vector` This pull request has now been integrated. Changeset: ca7b8858 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ca7b885873712a5ae503cb82c915d709034a69f7 Stats: 50 lines in 1 file changed: 21 ins; 0 del; 29 mod 8358749: Fix input checks in Vector API intrinsics Co-authored-by: Vladimir Ivanov Reviewed-by: vlivanov, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25673 From dfenacci at openjdk.org Tue Jun 10 07:08:32 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 10 Jun 2025 07:08:32 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: <9vk9GYispvT6R8FvfeHpfSELqQwhwUDJ56MMIOjvilU=.2965e790-6024-4cf8-8605-fa8bfea8cad2@github.com> On Tue, 10 Jun 2025 04:46:43 GMT, Amit Kumar wrote: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Thanks for fixing this @offamitkumar! I was wondering if we couldn't do this right at the start when we check the flag constraints, e.g. here: https://github.com/openjdk/jdk/blob/ca7b885873712a5ae503cb82c915d709034a69f7/src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp#L158 ------------- PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2912241472 From mhaessig at openjdk.org Tue Jun 10 07:09:31 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 10 Jun 2025 07:09:31 GMT Subject: RFR: 8356780: PhaseMacroExpand::_has_locks is unused [v3] In-Reply-To: References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: <2JAw7Z6LhrN3YIa7YbTVCc5APc8895plNh6_W12VfJ8=.9eeae16e-5949-4b42-b8f3-83ff1f226110@github.com> On Fri, 6 Jun 2025 15:24:11 GMT, Beno?t Maillard wrote: >> This PR removes the unused field `PhaseMacroExpand::_has_locks` >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8356780: Merge both while loops in PhaseMacroExpand::eliminate_macro_nodes" > > This reverts commit 13deb61de3e2a07d51e3692bb408971f6c18cecf. > - Revert "8356780: Remove useless assert" > > This reverts commit 4e7ee57981b9019d030ac59dc697a2470d8eb5eb. Looks good :slightly_smiling_face: ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25669#pullrequestreview-2912245170 From duke at openjdk.org Tue Jun 10 07:29:35 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 10 Jun 2025 07:29:35 GMT Subject: Integrated: 8356780: PhaseMacroExpand::_has_locks is unused In-Reply-To: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> References: <4Y_qCkNICY97KdColxyShQuBy9zdEVaZjJjkDtJD9do=.f09a465c-13d9-4e15-8b86-d99cac09b807@github.com> Message-ID: On Fri, 6 Jun 2025 07:41:45 GMT, Beno?t Maillard wrote: > This PR removes the unused field `PhaseMacroExpand::_has_locks` > > Thanks! This pull request has now been integrated. Changeset: 7c9c8ba3 Author: Beno?t Maillard Committer: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/7c9c8ba363521a7bfb58e1a8285459f717769889 Stats: 7 lines in 2 files changed: 0 ins; 5 del; 2 mod 8356780: PhaseMacroExpand::_has_locks is unused Reviewed-by: mhaessig, chagedorn, kvn, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/25669 From jbhateja at openjdk.org Tue Jun 10 07:39:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Jun 2025 07:39:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 04:00:49 GMT, Jatin Bhateja wrote: >> @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please? > > @merykitty , your comments on the following understand will be helpful > > Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values? > > A. Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT, and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD. > > Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations. Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization, it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync. > > During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization, in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point, signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this. > > Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible. The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong i.e. unsigned bounds, signed bounds or known bits may change. In practice only known bits (bit level df) and signed bounds may be usabl... > @jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below: > > In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned. > > I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants. "As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned" Sounds good, I think since now integral types has multiple lattics point associated with it, and all our existing Value transforms are based on signed value ranges, we will also need to extend existing value transforms to make use of KnownBits, for that we will need to extend newly added KnownBits class to support other operations which accepts KnownBits inputs operates over them as per the operation semantics and returns a new [KnownBits](https://github.com/llvm/llvm-project/blob/0c3a7725375ec583147429cc367320f0e8506847/llvm/include/llvm/Support/KnownBits.h#L384) which post canonicalization also adjusts other lattice points i.e. signed and unsigned ranges. All in all, IIUC we goding forward we are planning to perform bit-level data flow analysis using KnownBits and then apply its result over signed / unsigned ranges, this will ensure all existing Value transforms, which are based on value ranges, continue to be useful and KnownBits analysis will sharpen the ranges. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2957994918 From jbhateja at openjdk.org Tue Jun 10 07:42:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Jun 2025 07:42:53 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v65] In-Reply-To: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> References: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> Message-ID: On Mon, 9 Jun 2025 17:29:36 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: > > - Merge branch 'master' into unsignedbounds > - add more intn_t tests > - Emanuel's reviews > - alignment wording > - refinement > - refine the cases where there does not exist a result > - add some more sanity static_asserts > - intn_t refinements > - Emanuel's reviews > - Emanuel's reviews > - ... and 73 more: https://git.openjdk.org/jdk/compare/cae1fd33...ad6ac7cb Great work @merykitty , it was a pleasure to study your patch and refresh some of low level details of bit level data flow analysis. LGTM, Best Regards ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2912340763 From mhaessig at openjdk.org Tue Jun 10 07:48:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 10 Jun 2025 07:48:33 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client In-Reply-To: References: Message-ID: On Mon, 9 Jun 2025 23:57:46 GMT, Cesar Soares Lucas wrote: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Thank you for working on this. Now #25338 makes even more sense ? A naive question: is it possible to somehow share the enum definition in hotspot with the Java side in JVMCI? If all change reasons were enums, they would be much easier to understand. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2958019012 From epeter at openjdk.org Tue Jun 10 07:48:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 07:48:58 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 07:36:22 GMT, Jatin Bhateja wrote: >> @merykitty , your comments on the following understand will be helpful >> >> Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values? >> >> A. Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT, and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD. >> >> Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations. Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization, it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync. >> >> During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization, in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point, signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this. >> >> Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible. The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong i.e. unsigned bounds, signed bounds or known bits may change. In practice only known bits (bit level df) and sign... > >> @jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below: >> >> In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned. >> >> I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants. > > "As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned" > > Sounds good, I think since now integral types has multiple lattics point associated with it, and all our existing Value transforms are based on signed value ranges, we will also need to extend existing value transforms to make use of KnownBits, for that we will need to extend newly added KnownBits class to support other operations which accepts KnownBits inputs operates over them as per the operation semantics and returns a new [KnownBits](https://github.com/llvm/llvm-project/blob/0c3a7725375ec583147429cc367320f0e8506847/llvm/include/llvm/Support/KnownBits.h#L384) whic... @jatin-bhateja FYI: https://github.com/openjdk/jdk/pull/17508#issuecomment-2847009418 The goal is to make it possible to run gtests with any size integral types (e.g. 4 bit), so that we can efficiently test the corresponding value optimizations. @merykitty Did you already file a JBS issue for that? So when we refactor, we should try to create methods that take in types, and not nodes. That would allow us to generate types in the gtest, and get a type back. We can do all sorts of enhanced verification that way. For example, we can feed in wider and narrower types as inputs, and then expect that narrower types lead to narrower outputs, and wider types to wider outputs. If constants are fed in, then constants should come out. etc. This would really allow us to exhaustively verify for all sorts of ranges and bit patterns - at least for the smaller types (e.g. 4 bits). So there is indeed a lot of extension work to do, like @jatin-bhateja said. But we can use that to also refactor the code for testability. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2958020317 PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2958023447 From chagedorn at openjdk.org Tue Jun 10 08:11:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Jun 2025 08:11:30 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 08:36:37 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because >> we ran into some issues where a `Type` node is sunk and then becomes >> `top` but the control path of its uses doesn't become unreachable. >> >> 8349479 should have fixed that so that exception no longer makes >> sense. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25396#pullrequestreview-2912440249 From jbhateja at openjdk.org Tue Jun 10 08:20:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Jun 2025 08:20:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 07:36:22 GMT, Jatin Bhateja wrote: >> @merykitty , your comments on the following understand will be helpful >> >> Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values? >> >> A. Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT, and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD. >> >> Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations. Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization, it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync. >> >> During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization, in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point, signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this. >> >> Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible. The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong i.e. unsigned bounds, signed bounds or known bits may change. In practice only known bits (bit level df) and sign... > >> @jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below: >> >> In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned. >> >> I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants. > > "As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned" > > Sounds good, I think since now integral types has multiple lattics point associated with it, and all our existing Value transforms are based on signed value ranges, we will also need to extend existing value transforms to make use of KnownBits, for that we will need to extend newly added KnownBits class to support other operations which accepts KnownBits inputs operates over them as per the operation semantics and returns a new [KnownBits](https://github.com/llvm/llvm-project/blob/0c3a7725375ec583147429cc367320f0e8506847/llvm/include/llvm/Support/KnownBits.h#L384) whic... > @jatin-bhateja FYI: [#17508 (comment)](https://github.com/openjdk/jdk/pull/17508#issuecomment-2847009418) > > The goal is to make it possible to run gtests with any size integral types (e.g. 4 bit), so that we can efficiently test the corresponding value optimizations. @merykitty Did you already file a JBS issue for that? > > So when we refactor, we should try to create methods that take in types, and not nodes. That would allow us to generate types in the gtest, and get a type back. We can do all sorts of enhanced verification that way. For example, we can feed in wider and narrower types as inputs, and then expect that narrower types lead to narrower outputs, and wider types to wider outputs. If constants are fed in, then constants should come out. etc. This would really allow us to exhaustively verify for all sorts of ranges and bit patterns - at least for the smaller types (e.g. 4 bits). Thanks @eme64 for clarifying that is what I assumed, currently all value transforms are based on signed ranges and the role of KnownBits is restricted to canonicalization, which sync up the value ranges to known bits, but it will add much more value if we do the reverse where in data flow analysis is performed over KnownBits and then canonicalization adjusts the value ranges accordingly. So, adding a new gtest for each new operation is the right next step. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2958118099 From epeter at openjdk.org Tue Jun 10 08:21:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 08:21:29 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 14:43:40 GMT, Emanuel Peter wrote: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > Testing passed tier1-3, with extra timeout factor 20. @mhaessig Thanks for the idea! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2958129493 From shade at openjdk.org Tue Jun 10 08:32:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 08:32:28 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 10 Jun 2025 04:46:43 GMT, Amit Kumar wrote: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. There is already `CodeCacheSegmentSizeConstraintFunc` that verifies the flag values. Move the check there? ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2912518138 From amitkumar at openjdk.org Tue Jun 10 08:41:16 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 10 Jun 2025 08:41:16 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: - fix - move the changes in flag constraints specific file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/55af21fd..539b3fe4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=00-01 Stats: 10 lines in 2 files changed: 7 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From shade at openjdk.org Tue Jun 10 08:46:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 08:46:28 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 10 Jun 2025 08:41:16 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - move the changes in flag constraints specific file Looks okay to me. Compiler folk should take a look as well. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2912570650 From amitkumar at openjdk.org Tue Jun 10 08:46:29 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 10 Jun 2025 08:46:29 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: <89l9U81jwnAEejqzipNyI9THFl1bTCaM_SMTMsQop5A=.7105c14b-3e9d-48a4-a7b1-b8b4934f13a1@github.com> On Tue, 10 Jun 2025 08:41:16 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - move the changes in flag constraints specific file Thanks for the suggestion. I have moved the changes to the constraints specific file (I didn't know that this file exists, So missed it). Please have another look. @shipilev @dafedafe Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2958216293 From mhaessig at openjdk.org Tue Jun 10 08:48:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 10 Jun 2025 08:48:32 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 14:43:40 GMT, Emanuel Peter wrote: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > Testing passed tier1-3, with extra timeout factor 20. Thank you for working on this and diligently noting the exceptions. That is quite the todo list you discovered ?. It is great to see more verification going in. I noted some minor things, mostly in the comments. With or without those, this looks good to me. src/hotspot/share/opto/c2_globals.hpp line 685: > 683: " B: verify that type(n) == n->Value() after IGVN" \ > 684: " C: verify all Node::Ideal were applied in IGVN" \ > 685: " D: verify all Node::Identity were applied in IGVN" \ Suggestion: " C: verify Node::Ideal did not miss opportunities" \ " D: verify Node::Identity did not miss opportunities" \ For me, "all `Node::Ideal` were applied" parsed weirdly, so I tried my hand at an alternative formulation. Feel free to ignore. src/hotspot/share/opto/phaseX.cpp line 1188: > 1186: } > 1187: > 1188: // Check that all Ideal optimizations that could be done were done. Suggestion: // Check that all Ideal optimizations that could be done were done. // Returns true if it found missed optimization opportunities and false otherwise and for exceptions. The return value was not immediately clear to me. src/hotspot/share/opto/phaseX.cpp line 1803: > 1801: } > 1802: tty->print_cr("The result after Ideal:"); > 1803: i->dump_bfs(1, nullptr, ""); Perhaps taking the tty lock might be appropriate, due to the amount of printing? Or do we know that nothing else is printing? src/hotspot/share/opto/phaseX.cpp line 1807: > 1805: } > 1806: > 1807: // Check that all Identity optimizations that could be done were done. Suggestion: // Check that all Identity optimizations that could be done were done. // Returns true if it found missed optimization opportunities and false otherwise and for exceptions. As above. src/hotspot/share/opto/phaseX.cpp line 1948: > 1946: > 1947: if (n->is_Load()) { > 1948: // LoadNode::Identity tries to look for an earier store value via Suggestion: // LoadNode::Identity tries to look for an earlier store value via src/hotspot/share/opto/phaseX.cpp line 1991: > 1989: n->dump_bfs(1, nullptr, ""); > 1990: tty->print_cr("New node:"); > 1991: i->dump_bfs(1, nullptr, ""); Suggestion: // The verificatin just found a new Identity that was not found during IGVN. tty->cr(); tty->print_cr("Missed Identity optimization:"); tty->print_cr("Old node:"); n->dump_bfs(1, nullptr, ""); tty->print_cr("New node:"); i->dump_bfs(1, nullptr, ""); The wording of the comment confused me a bit ? Also, perhaps taking the tty lock might be appropriate since you are printing a lot here? Or do we know that only verification is printing at this point? ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/22970#pullrequestreview-2912495849 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137284714 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137242097 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137266075 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137243146 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137259432 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137257629 From epeter at openjdk.org Tue Jun 10 08:56:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 08:56:46 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v2] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/a12d49a0..1042ef54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=00-01 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From mhaessig at openjdk.org Tue Jun 10 09:12:35 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 10 Jun 2025 09:12:35 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v2] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 08:56:46 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Manuel H?ssig src/hotspot/share/opto/phaseX.cpp line 1987: > 1985: } > 1986: > 1987: // The verificatin just found a new Identity that was not found during IGVN. Suggestion: // The verification just found a new Identity that was not found during IGVN. I guess, I suggested a typo... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137340707 From epeter at openjdk.org Tue Jun 10 09:32:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 09:32:46 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Manuel H?ssig - review suggestions, and handled a few more edge cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/1042ef54..5aa5444d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=01-02 Stats: 45 lines in 1 file changed: 35 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Tue Jun 10 09:32:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 09:32:46 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 08:46:04 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > Thank you for working on this and diligently noting the exceptions. That is quite the todo list you discovered ?. It is great to see more verification going in. > > I noted some minor things, mostly in the comments. With or without those, this looks good to me. @mhaessig Thanks for reviewing! Yes this was rather a lot of gruesome work actually ? But worth it I think ? I applied you suggestions. And I found some more cases in tier4 and stress testing, so I handled those as well now. > src/hotspot/share/opto/phaseX.cpp line 1803: > >> 1801: } >> 1802: tty->print_cr("The result after Ideal:"); >> 1803: i->dump_bfs(1, nullptr, ""); > > Perhaps taking the tty lock might be appropriate, due to the amount of printing? Or do we know that nothing else is printing? good idea! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2958370829 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137375811 From snatarajan at openjdk.org Tue Jun 10 09:34:06 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 10 Jun 2025 09:34:06 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination Message-ID: This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. Changes: - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` - Added a new Ideal phase for individual macro elimination steps, allowing StressMacroElimination testing. - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to testing for previous stress flags such as `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). Below is a sample screenshot (IGV print level 5 with) mainly showing the new phase . ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) Questions to reviewers: - Is the new macro elimination phase OK, or should we change anything? - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? Testing: GitHub Actions tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) ------------- Commit messages: - Initial Fix Changes: https://git.openjdk.org/jdk/pull/25682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325478 Stats: 75 lines in 11 files changed: 52 ins; 9 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From chagedorn at openjdk.org Tue Jun 10 10:16:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Jun 2025 10:16:32 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: Message-ID: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> On Tue, 10 Jun 2025 09:32:46 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Manuel H?ssig > - review suggestions, and handled a few more edge cases Great effort and analysis! I have some comments/questions. src/hotspot/share/opto/c2_globals.hpp line 686: > 684: " C: verify Node::Ideal did not miss opportunities" \ > 685: " D: verify Node::Identity did not miss opportunities" \ > 686: "A, B, C, and D in 0=off; 1=on") \ Why don't you use ABCD? It seems strange/unexpected to reverse the alphabetical order. src/hotspot/share/opto/phaseX.cpp line 1090: > 1088: if (is_verify_Ideal()) { failure |= verify_node_Ideal(n, false); } > 1089: if (is_verify_Ideal()) { failure |= verify_node_Ideal(n, true); } > 1090: if (is_verify_Identity()) { failure |= verify_node_Identity(n); } Suggestion: How about naming them `verify_Value/Ideal/Identity_for(n)`? src/hotspot/share/opto/phaseX.cpp line 1126: > 1124: node->dump(); > 1125: } > 1126: assert(_worklist.size() == 0, "igvn worklist must still be empty after verify"); The `_worklist` size does not seem to change after the bailout on L1114. So, we know that here the worklist is non-empty. Would `assert(false)` fit better? src/hotspot/share/opto/phaseX.cpp line 1196: > 1194: // Returns true if it found missed optimization opportunities and > 1195: // false otherwise (no missed optimization, or skipped verification). > 1196: bool PhaseIterGVN::verify_node_Ideal(Node* n, bool can_reshape) { General comment about your analysis for Ideal and Identity for why you disabled some of the verification. Very thorough and nicely explained! I'm wondering though if we should just open a tracking JBS issue (we could use JDK-8347273), dump the analysis there and refer to that JBS issue from the code for further details. This would allow us to use some permalinks from GitHub (we should probably not post them in the code directly) or extend the analysis with additional images etc. You also included a lot of best guesses (which is totally understandable!) which we might want to extend, comment further on in a discussion, or update because we know more about them. For that, we would need to update the actual code each time which seems unfortunate - and we might not fix some things because it does not seem worth the effort for tiny mistakes or updates. In JBS this comes for free. What do you think about that? Of course, in the end, it's also a trade-off. ------------- PR Review: https://git.openjdk.org/jdk/pull/22970#pullrequestreview-2912749936 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137403056 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137472659 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137441404 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137434095 From chagedorn at openjdk.org Tue Jun 10 10:16:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Jun 2025 10:16:33 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 09:28:21 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/phaseX.cpp line 1803: >> >>> 1801: } >>> 1802: tty->print_cr("The result after Ideal:"); >>> 1803: i->dump_bfs(1, nullptr, ""); >> >> Perhaps taking the tty lock might be appropriate, due to the amount of printing? Or do we know that nothing else is printing? > > good idea! You could also define a `stringStream ss` and pass that one to `dump_bfs()`. We do a similar thing for `print_ideal_ir()` to keep everything in a block. As a bonus: We don't suffer from a tty lock being broken - even though that would not affect correctness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137461406 From roland at openjdk.org Tue Jun 10 10:21:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Jun 2025 10:21:46 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account Message-ID: `test1()` has a counted loop with a `Store` to `field`. That `Store` is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only `Phi`s that exist at the inner loop are added to the outer loop. There's no `Phi` for the slice of the sunk `Store` (because there's no `Store` left in the inner loop) so no `Phi` is added for that slice to the outer loop. As a result, there's a missing anti dependency for `Load` of `field` that's before the loop and it can be scheduled inside the outer strip mined loop which is incorrect. `test2()` is the same as `test1()` but with a chain of 2 `Store`s. `test3()` is another variant where a `Store` is left in the inner loop after one is sunk out of it so the inner loop still has a `Phi`. As a result, the outer loop also gets a `Phi` but it's incorrectly wired as the sunk `Store` should be the input along the backedge but is not. That one doesn't cause any failure AFAICT. The fix I propose is some extra logic at expansion of the `OuterStripMinedLoop` to handle these corner cases. ------------- Commit messages: - test & fix Changes: https://git.openjdk.org/jdk/pull/25717/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356708 Stats: 188 lines in 3 files changed: 180 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From dnsimon at openjdk.org Tue Jun 10 10:37:32 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 10 Jun 2025 10:37:32 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client In-Reply-To: References: Message-ID: On Mon, 9 Jun 2025 23:57:46 GMT, Cesar Soares Lucas wrote: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/InstalledCode.java line 127: > 125: } > 126: > 127: public int getChangeReason() { Please add javadoc to this method as well as `getChangeReasonDescription`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2137514507 From shade at openjdk.org Tue Jun 10 10:45:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 10:45:35 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v3] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:00:05 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion Leyden/premain would require adjustments after this lands; here they are: https://github.com/openjdk/leyden/pull/79 -- that should prove this mainline PR would work with Leyden/premain without problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25409#issuecomment-2958617941 From dnsimon at openjdk.org Tue Jun 10 10:45:47 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 10 Jun 2025 10:45:47 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 07:45:34 GMT, Manuel H?ssig wrote: > A naive question: is it possible to somehow share the enum definition in hotspot with the Java side in JVMCI? If all change reasons were enums, they would be much easier to understand. Not directly but you can have a mirror enum in JVMCI whose ordinal and message could be kept in sync and initialized from native code. As an example, see [jdk.vm.ci.hotspot.HotSpotCompiledCodeStream.Tag](https://github.com/openjdk/jdk/blob/3ff83ec49e561c44dd99508364b8ba068274b63a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotCompiledCodeStream.java#L228). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2958619966 From duke at openjdk.org Tue Jun 10 10:52:07 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 10 Jun 2025 10:52:07 GMT Subject: RFR: 8356751: IGV: clean up redundant field _should_send_method Message-ID: This PR removes private field `IdealGraphPrinter::_should_send_method` (as it was only ever assigned `true`) and all usages in conditions. ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356751) - [x] tier1, and some Oracle internal testing - [x] Ran IdealGraphVisualizer and made sure that no crash occured Shout out to @mhaessig for introducing me to IGV Thanks! ------------- Commit messages: - 8356751: Remove redundant field _should_send_method Changes: https://git.openjdk.org/jdk/pull/25714/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25714&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356751 Stats: 11 lines in 2 files changed: 0 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25714.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25714/head:pull/25714 PR: https://git.openjdk.org/jdk/pull/25714 From shade at openjdk.org Tue Jun 10 10:53:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Jun 2025 10:53:42 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v4] In-Reply-To: References: Message-ID: > See bug for more discussion. > > This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into JDK-8357473-compile-task-free-list - Merge branch 'master' into JDK-8357473-compile-task-free-list - Merge branch 'master' into JDK-8357473-compile-task-free-list - Also free the lock! - Comments and indenting - Basic deletion ------------- Changes: https://git.openjdk.org/jdk/pull/25409/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25409&range=03 Stats: 105 lines in 4 files changed: 13 ins; 57 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/25409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25409/head:pull/25409 PR: https://git.openjdk.org/jdk/pull/25409 From mhaessig at openjdk.org Tue Jun 10 11:29:28 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 10 Jun 2025 11:29:28 GMT Subject: RFR: 8356751: IGV: clean up redundant field _should_send_method In-Reply-To: References: Message-ID: <-ZUDn1Z1jT-lzubJG0JI3vAK6KdtSHjB5JM66jdPgfI=.737be3b7-0d5f-45fe-9e1a-4970cdf89f2c@github.com> On Tue, 10 Jun 2025 08:58:15 GMT, Beno?t Maillard wrote: > This PR removes private field `IdealGraphPrinter::_should_send_method` (as it was only ever assigned `true`) and all usages in conditions. > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356751) > - [x] tier1, and some Oracle internal testing > - [x] Ran IdealGraphVisualizer and made sure that no crash occured > > Shout out to @mhaessig for introducing me to IGV > > Thanks! Thank you for working on this and also for the credit :slightly_smiling_face: This looks good to me. ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25714#pullrequestreview-2913133373 From epeter at openjdk.org Tue Jun 10 11:38:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 11:38:31 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 09:42:50 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > src/hotspot/share/opto/c2_globals.hpp line 686: > >> 684: " C: verify Node::Ideal did not miss opportunities" \ >> 685: " D: verify Node::Identity did not miss opportunities" \ >> 686: "A, B, C, and D in 0=off; 1=on") \ > > Why don't you use ABCD? It seems strange/unexpected to reverse the alphabetical order. It would allow us to extend it further with a most significant bit of `E`, so that the order is `EDCBA`. If I do it in alphabetical order, then I would have to rename them. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137655858 From epeter at openjdk.org Tue Jun 10 11:51:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 11:51:41 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: <2KUfq2OXSed2NMT3tYyh6wP2tQxAyUJmW38vXWUAURk=.2eb438a8-8b77-43a6-a478-0326f2b850cb@github.com> On Tue, 10 Jun 2025 10:00:12 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > src/hotspot/share/opto/phaseX.cpp line 1126: > >> 1124: node->dump(); >> 1125: } >> 1126: assert(_worklist.size() == 0, "igvn worklist must still be empty after verify"); > > The `_worklist` size does not seem to change after the bailout on L1114. So, we know that here the worklist is non-empty. Would `assert(false)` fit better? @chhagedorn While `assert(false)` would be correct, I think the check is a bit more expressive, that is why I left it in. But I guess the comment also says the same. Let me know what you prefer, I'm undecided. > src/hotspot/share/opto/phaseX.cpp line 1196: > >> 1194: // Returns true if it found missed optimization opportunities and >> 1195: // false otherwise (no missed optimization, or skipped verification). >> 1196: bool PhaseIterGVN::verify_node_Ideal(Node* n, bool can_reshape) { > > General comment about your analysis for Ideal and Identity for why you disabled some of the verification. Very thorough and nicely explained! I'm wondering though if we should just open a tracking JBS issue (we could use JDK-8347273), dump the analysis there and refer to that JBS issue from the code for further details. This would allow us to use some permalinks from GitHub (we should probably not post them in the code directly) or extend the analysis with additional images etc. You also included a lot of best guesses (which is totally understandable!) which we might want to extend, comment further on in a discussion, or update because we know more about them. For that, we would need to update the actual code each time which seems unfortunate - and we might not fix some things because it does not seem worth the effort for tiny mistakes or updates. In JBS this comes for free. > > What do you think about that? Of course, in the end, it's also a trade-off. Having the whole conversation in a single JBS issue sounds a bit tricky... it is more like 100 different issues each with their own conversation. And I don't yet know which nodes have to be fixed together, and which nodes have multiple problems. I would also prefer if the comments were in the code - it's not that bad to create a JBS issue and commit the comments. That way, everything is in the code, and not spread over multiple JBS issues and GitHub conversations. My suggestion is this: - Use the umbrella issue: [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) - There, we can do some basic triaging, and then file subtasks. - In the end, everything interesting to know needs to be committed back. Including text and pictures (ASCII). @chhagedorn Would that work for you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137676530 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137674037 From epeter at openjdk.org Tue Jun 10 11:56:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 11:56:34 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: <49Ml0-TqyJe2aM-U82FuTy7dO6RKT6aqxSZKawJxvTA=.a554f0aa-64e4-419f-b823-152e5409414e@github.com> On Tue, 10 Jun 2025 10:12:50 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > src/hotspot/share/opto/phaseX.cpp line 1090: > >> 1088: if (is_verify_Ideal()) { failure |= verify_node_Ideal(n, false); } >> 1089: if (is_verify_Ideal()) { failure |= verify_node_Ideal(n, true); } >> 1090: if (is_verify_Identity()) { failure |= verify_node_Identity(n); } > > Suggestion: How about naming them `verify_Value/Ideal/Identity_for(n)`? I don't care. For me they are equally good :) I'll change it to your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137685509 From thartmann at openjdk.org Tue Jun 10 11:59:32 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 10 Jun 2025 11:59:32 GMT Subject: RFR: 8356751: IGV: clean up redundant field _should_send_method In-Reply-To: References: Message-ID: <_CKCshjbzh3Cs8AyC3DRF8pAZRbNkWiYsQhhhWHdZv4=.e91dc431-dbea-4c89-89f5-8a4a469ae8a6@github.com> On Tue, 10 Jun 2025 08:58:15 GMT, Beno?t Maillard wrote: > This PR removes private field `IdealGraphPrinter::_should_send_method` (as it was only ever assigned `true`) and all usages in conditions. > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356751) > - [x] tier1, and some Oracle internal testing > - [x] Ran IdealGraphVisualizer and made sure that no crash occured > > Shout out to @mhaessig for introducing me to IGV > > Thanks! Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25714#pullrequestreview-2913227044 From epeter at openjdk.org Tue Jun 10 12:02:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 12:02:14 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v4] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - assert(false) for Christian - rename for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/5aa5444d..875ad17d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=02-03 Stats: 13 lines in 2 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Tue Jun 10 12:02:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 12:02:14 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <2KUfq2OXSed2NMT3tYyh6wP2tQxAyUJmW38vXWUAURk=.2eb438a8-8b77-43a6-a478-0326f2b850cb@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> <2KUfq2OXSed2NMT3tYyh6wP2tQxAyUJmW38vXWUAURk=.2eb438a8-8b77-43a6-a478-0326f2b850cb@github.com> Message-ID: On Tue, 10 Jun 2025 11:48:31 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/phaseX.cpp line 1126: >> >>> 1124: node->dump(); >>> 1125: } >>> 1126: assert(_worklist.size() == 0, "igvn worklist must still be empty after verify"); >> >> The `_worklist` size does not seem to change after the bailout on L1114. So, we know that here the worklist is non-empty. Would `assert(false)` fit better? > > @chhagedorn While `assert(false)` would be correct, I think the check is a bit more expressive, that is why I left it in. But I guess the comment also says the same. Let me know what you prefer, I'm undecided. Boah, I'll just change it. I don't care and that way we don't have to discuss it ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137695070 From epeter at openjdk.org Tue Jun 10 12:05:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 12:05:30 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v4] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:07:53 GMT, Christian Hagedorn wrote: >> good idea! > > You could also define a `stringStream ss` and pass that one to `dump_bfs()`. We do a similar thing for `print_ideal_ir()` to keep everything in a block. As a bonus: We don't suffer from a tty lock being broken - even though that would not affect correctness. Sure, I can make the change! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137706000 From epeter at openjdk.org Tue Jun 10 12:16:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 12:16:13 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v5] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use stringStream instead of ttyLocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/875ad17d..d50775b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=03-04 Stats: 41 lines in 1 file changed: 5 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Tue Jun 10 12:16:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 12:16:14 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 10:13:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > Great effort and analysis! I have some comments/questions. @chhagedorn Thanks for reviewing! I addressed all your comments and suggestions :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2958970465 From dfenacci at openjdk.org Tue Jun 10 12:42:29 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 10 Jun 2025 12:42:29 GMT Subject: RFR: 8356751: IGV: clean up redundant field _should_send_method In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 08:58:15 GMT, Beno?t Maillard wrote: > This PR removes private field `IdealGraphPrinter::_should_send_method` (as it was only ever assigned `true`) and all usages in conditions. > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356751) > - [x] tier1, and some Oracle internal testing > - [x] Ran IdealGraphVisualizer and made sure that no crash occured > > Shout out to @mhaessig for introducing me to IGV > > Thanks! Looks good to me too! Thanks a lot @benoitmaillard! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/25714#pullrequestreview-2913391903 From chagedorn at openjdk.org Tue Jun 10 13:07:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Jun 2025 13:07:37 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 11:36:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/c2_globals.hpp line 686: >> >>> 684: " C: verify Node::Ideal did not miss opportunities" \ >>> 685: " D: verify Node::Identity did not miss opportunities" \ >>> 686: "A, B, C, and D in 0=off; 1=on") \ >> >> Why don't you use ABCD? It seems strange/unexpected to reverse the alphabetical order. > > It would allow us to extend it further with a most significant bit of `E`, so that the order is `EDCBA`. If I do it in alphabetical order, then I would have to rename them. What do you think? Okay, I only looked at it from a user-perspective that you might mismatch the description to the value passed to the flag. What could help here is reversing the order you mention the modes: first D:, then C: etc. >> src/hotspot/share/opto/phaseX.cpp line 1196: >> >>> 1194: // Returns true if it found missed optimization opportunities and >>> 1195: // false otherwise (no missed optimization, or skipped verification). >>> 1196: bool PhaseIterGVN::verify_node_Ideal(Node* n, bool can_reshape) { >> >> General comment about your analysis for Ideal and Identity for why you disabled some of the verification. Very thorough and nicely explained! I'm wondering though if we should just open a tracking JBS issue (we could use JDK-8359103 >> )), dump the analysis there and refer to that JBS issue from the code for further details. This would allow us to use some permalinks from GitHub (we should probably not post them in the code directly) or extend the analysis with additional images etc. You also included a lot of best guesses (which is totally understandable!) which we might want to extend, comment further on in a discussion, or update because we know more about them. For that, we would need to update the actual code each time which seems unfortunate - and we might not fix some things because it does not seem worth the effort for tiny mistakes or updates. In JBS this comes for free. >> >> What do you think about that? Of course, in the end, it's also a trade-off. > > Having the whole conversation in a single JBS issue sounds a bit tricky... it is more like 100 different issues each with their own conversation. > > And I don't yet know which nodes have to be fixed together, and which nodes have multiple problems. > > I would also prefer if the comments were in the code - it's not that bad to create a JBS issue and commit the comments. That way, everything is in the code, and not spread over multiple JBS issues and GitHub conversations. > > My suggestion is this: > - Use the umbrella issue: [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) > C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > - There, we can do some basic triaging, and then file subtasks. > - In the end, everything interesting to know needs to be committed back. Including text and pictures (ASCII). > > @chhagedorn Would that work for you? I'm honestly not sure what the best way is. Currently, it feels a bit too verbose when also mentioning reproducers with command line options and failing tests which sounds more like things to keep track of in JBS. But I also see your point that having everything in the comments is quite handy and keeps everything in one part. Maybe we can find some middle ground when you move the "how to reproduce" to the umbrella JBS? The rest you can keep in the comments, I'm fine with that and see its benefit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137814182 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137840332 From chagedorn at openjdk.org Tue Jun 10 13:07:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Jun 2025 13:07:37 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> <2KUfq2OXSed2NMT3tYyh6wP2tQxAyUJmW38vXWUAURk=.2eb438a8-8b77-43a6-a478-0326f2b850cb@github.com> Message-ID: On Tue, 10 Jun 2025 11:57:11 GMT, Emanuel Peter wrote: >> @chhagedorn While `assert(false)` would be correct, I think the check is a bit more expressive, that is why I left it in. But I guess the comment also says the same. Let me know what you prefer, I'm undecided. > > Boah, I'll just change it. I don't care and that way we don't have to discuss it ;) If you don't agree and the code is not wrong, I leave it up to you to decide for these subjective suggestions. I think what made me add the suggestion was that the assert in the end suggested that there is some logic that tries to empty the list between the bailout and the assert which was not the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137826499 From chagedorn at openjdk.org Tue Jun 10 13:07:39 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Jun 2025 13:07:39 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v5] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 12:16:13 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use stringStream instead of ttyLocker src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 303: > 301: JVMFlag::Error VerifyIterativeGVNConstraintFunc(uint value, bool verbose) { > 302: uint original_value = value; > 303: for (int i = 0; i < 4; i++) { You might want to consider adding a `const int max_modes = 4` or something like that and use it also below in the error message. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137843279 From epeter at openjdk.org Tue Jun 10 13:19:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 13:19:32 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 12:46:48 GMT, Christian Hagedorn wrote: >> It would allow us to extend it further with a most significant bit of `E`, so that the order is `EDCBA`. If I do it in alphabetical order, then I would have to rename them. What do you think? > > Okay, I only looked at it from a user-perspective that you might mismatch the description to the value passed to the flag. What could help here is reversing the order you mention the modes: first D:, then C: etc. @chhagedorn If I mention `D` on the same line as `=DCBA, with `, then we have to change 2 lines next time. But I suppose we will have to change all lines anyway because the indentation would change... What I would really want to avoid is to have to change the parsing. So the lowest significant bits have to stay where they are, but I can rename them. Why don't you make a suggestion how you would like it to look, and then I can apply it :) >> Having the whole conversation in a single JBS issue sounds a bit tricky... it is more like 100 different issues each with their own conversation. >> >> And I don't yet know which nodes have to be fixed together, and which nodes have multiple problems. >> >> I would also prefer if the comments were in the code - it's not that bad to create a JBS issue and commit the comments. That way, everything is in the code, and not spread over multiple JBS issues and GitHub conversations. >> >> My suggestion is this: >> - Use the umbrella issue: [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) >> C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> - There, we can do some basic triaging, and then file subtasks. >> - In the end, everything interesting to know needs to be committed back. Including text and pictures (ASCII). >> >> @chhagedorn Would that work for you? > > I'm honestly not sure what the best way is. Currently, it feels a bit too verbose when also mentioning reproducers with command line options and failing tests which sounds more like things to keep track of in JBS. But I also see your point that having everything in the comments is quite handy and keeps everything in one part. Maybe we can find some middle ground when you move the "how to reproduce" to the umbrella JBS? The rest you can keep in the comments, I'm fine with that and see its benefit. @chhagedorn And how do we link from code <-> JBS issue? How do we make sure that this stays up to date when the code changes around? Because I predict that this will all move a lot over the next months. Personally, I prefer the verbose character here. I spent a lot of time finding reproducers, and if we put them in JBS, they will most likely just get lost. That would be sad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137865665 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137879139 From epeter at openjdk.org Tue Jun 10 13:19:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 13:19:33 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> <2KUfq2OXSed2NMT3tYyh6wP2tQxAyUJmW38vXWUAURk=.2eb438a8-8b77-43a6-a478-0326f2b850cb@github.com> Message-ID: On Tue, 10 Jun 2025 12:52:26 GMT, Christian Hagedorn wrote: >> Boah, I'll just change it. I don't care and that way we don't have to discuss it ;) > > If you don't agree and the code is not wrong, I leave it up to you to decide for these subjective suggestions. I think what made me add the suggestion was that the assert in the end suggested that there is some logic that tries to empty the list between the bailout and the assert which was not the case. It is already changed :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137882060 From epeter at openjdk.org Tue Jun 10 13:19:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 13:19:34 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v5] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 13:00:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use stringStream instead of ttyLocker > > src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 303: > >> 301: JVMFlag::Error VerifyIterativeGVNConstraintFunc(uint value, bool verbose) { >> 302: uint original_value = value; >> 303: for (int i = 0; i < 4; i++) { > > You might want to consider adding a `const int max_modes = 4` or something like that and use it also below in the error message. Sure, I can do that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2137883591 From epeter at openjdk.org Tue Jun 10 13:24:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 13:24:52 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v6] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: max_modes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/d50775b8..97af8205 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=04-05 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From tkurashige at openjdk.org Tue Jun 10 13:44:05 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Tue, 10 Jun 2025 13:44:05 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library Message-ID: This PR is improvement of warning message when fail to load hsdis library. [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." To clear up this confusion, I suggest printing a warning just before [MachCode].

sample output If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: . . native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section [MachCode] 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 . . If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output ============================= C1-compiled nmethod ============================== ----------------------------------- Assembly ----------------------------------- Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 . . [Constant Pool (empty)] Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section [MachCode] [Instructions begin] 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b . . [Constant Pool (empty)] Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section [MachCode] [Verified Entry Point] # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte . .
Since the warning added in this fix cover the role of warning introduced in [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001), I removed lines added in [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) and [JDK-8289421](https://bugs.openjdk.org/browse/JDK-8289421). Testing: ?test/hotspot/jtreg/runtime/ErrorHandling on Windows Server 2019 (Some tests use fatdebug version, so I ran with fatdebug version) ?tested on Windows Server 2019 that each print_cr() in 3 routes prints warning just before [MachCode] and that JDK-8287001's warning isn't printed ?GHA testing ------------- Commit messages: - Remove include of log.hpp - Merge branch 'openjdk:master' into hsdis_load_err_msg - Improve hsdis load error message Changes: https://git.openjdk.org/jdk/pull/25726/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25726&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359120 Stats: 12 lines in 3 files changed: 9 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25726.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25726/head:pull/25726 PR: https://git.openjdk.org/jdk/pull/25726 From mdoerr at openjdk.org Tue Jun 10 14:16:42 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 10 Jun 2025 14:16:42 GMT Subject: RFR: 8359126: [AIX] new test TestImplicitNullChecks.java fails Message-ID: Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. ------------- Commit messages: - 8359126: [AIX] new test TestImplicitNullChecks.java fails Changes: https://git.openjdk.org/jdk/pull/25728/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25728&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359126 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25728/head:pull/25728 PR: https://git.openjdk.org/jdk/pull/25728 From mdoerr at openjdk.org Tue Jun 10 14:16:42 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 10 Jun 2025 14:16:42 GMT Subject: RFR: 8359126: [AIX] new test TestImplicitNullChecks.java fails In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 14:10:04 GMT, Martin Doerr wrote: > Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. @robcasloz: Sorry that I had forgotten to check AIX. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25728#issuecomment-2959423412 From roland at openjdk.org Tue Jun 10 14:20:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Jun 2025 14:20:14 GMT Subject: RFR: 8358334: C2/Shenandoah: incorrect execution with Unsafe Message-ID: When a barrier is expanded, some control is picked as a location for the barrier. The control input of data nodes that depend on that control are updated so the nodes are after the expanded barrier unless the barrier itself depends on some of those nodes. In this particular failure, a raw memoy `Store` is the input memory to the barrier. That `Store` has an anti-dependent `Load`. All 3 nodes (barrier, `Load` and `Store`) are at the same control. The `Store` is an input to the barrier so it stays before the barrier. The `Load`'s control is updated to be after the barrier which breaks the anti-dependency. The bug is that the logic that sorts nodes that need to be before the barrier and those that can be after ignores anti-dependencies. The fix simply extends that logic to take them into account. ------------- Commit messages: - whitespaces - fix & test Changes: https://git.openjdk.org/jdk/pull/25729/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25729&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358334 Stats: 153 lines in 3 files changed: 117 ins; 26 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25729.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25729/head:pull/25729 PR: https://git.openjdk.org/jdk/pull/25729 From roland at openjdk.org Tue Jun 10 14:21:38 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Jun 2025 14:21:38 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v3] In-Reply-To: <2T83tcCFOvCp4BElLS5ufAb7RR2ZkEUFXPOWEOAhVYg=.ed4262e0-5f36-4f95-9c4c-b4f750b6e555@github.com> References: <2T83tcCFOvCp4BElLS5ufAb7RR2ZkEUFXPOWEOAhVYg=.ed4262e0-5f36-4f95-9c4c-b4f750b6e555@github.com> Message-ID: On Fri, 23 May 2025 06:55:33 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn > > That looks good to me but given that we had quite a few bugs in that area in the past, I would suggest to only integrate into JDK 26 after the fork on June 05, 2025. @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25396#issuecomment-2959443738 From roland at openjdk.org Tue Jun 10 14:21:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Jun 2025 14:21:39 GMT Subject: Integrated: 8354383: C2: enable sinking of Type nodes out of loop In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:53:18 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because > we ran into some issues where a `Type` node is sunk and then becomes > `top` but the control path of its uses doesn't become unreachable. > > 8349479 should have fixed that so that exception no longer makes > sense. This pull request has now been integrated. Changeset: a2f99fd8 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/a2f99fd88bd03337e1ba73b413ffe4e39f3584cf Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8354383: C2: enable sinking of Type nodes out of loop Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25396 From roland at openjdk.org Tue Jun 10 14:30:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Jun 2025 14:30:20 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v4] In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: <44QlU4IGt2ppivho0EYO42iO6xCa-hmk6BZ_Ksmk_l4=.9c65616b-5b4a-4b3c-8080-6642347bc1c0@github.com> > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopConditionalPropagation.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14586/files - new: https://git.openjdk.org/jdk/pull/14586/files/22091449..907e65f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14586/head:pull/14586 PR: https://git.openjdk.org/jdk/pull/14586 From epeter at openjdk.org Tue Jun 10 14:44:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 14:44:48 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v7] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: reorder flags for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/97af8205..ffc54f6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=05-06 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Tue Jun 10 14:44:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 14:44:48 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 10:13:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > Great effort and analysis! I have some comments/questions. @chhagedorn I think I addressed all your concerns. The only question remaining is if we should have the "reproducers" in the code comments. Let's see what @TobiHartmann says. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2959528486 From roland at openjdk.org Tue Jun 10 15:01:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Jun 2025 15:01:31 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Mon, 9 Jun 2025 07:35:10 GMT, Roberto Casta?eda Lozano wrote: > Let me know if you need more details to reproduce these issues. Thanks for the reports but I can't reproduce any of them. Are there extra options that need to be passed to the JVM? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2959594527 From epeter at openjdk.org Tue Jun 10 15:02:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 15:02:31 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 13:10:04 GMT, Emanuel Peter wrote: >> Okay, I only looked at it from a user-perspective that you might mismatch the description to the value passed to the flag. What could help here is reversing the order you mention the modes: first D:, then C: etc. > > @chhagedorn If I mention `D` on the same line as `=DCBA, with `, then we have to change 2 lines next time. > > But I suppose we will have to change all lines anyway because the indentation would change... > > What I would really want to avoid is to have to change the parsing. So the lowest significant bits have to stay where they are, but I can rename them. > > Why don't you make a suggestion how you would like it to look, and then I can apply it :) I updated it to what we discussed offline :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2138136742 From epeter at openjdk.org Tue Jun 10 15:38:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 15:38:31 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v3] In-Reply-To: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> References: <3XjHLWBEYmn-1otPBnpKH3xLz100BR17x9_rGMAlQus=.e2331b5b-3197-407c-97bf-857ea0bd951c@github.com> Message-ID: On Tue, 10 Jun 2025 10:13:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases > > Great effort and analysis! I have some comments/questions. @chhagedorn I checked with @TobiHartmann : he said he does not have a strong opinion, but if he had to make a decision, he would prefers having everything in the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2959743354 From qamai at openjdk.org Tue Jun 10 15:53:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Jun 2025 15:53:54 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 07:46:46 GMT, Emanuel Peter wrote: >>> @jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below: >>> >>> In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned. >>> >>> I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants. >> >> "As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned" >> >> Sounds good, I think since now integral types has multiple lattics point associated with it, and all our existing Value transforms are based on signed value ranges, we will also need to extend existing value transforms to make use of KnownBits, for that we will need to extend newly added KnownBits class to support other operations which accepts KnownBits inputs operates over them as per the operation semantics and returns a new [KnownBits](https://github.com/llvm/llvm-project/blob/0c3a7725375ec583147429cc367320f0e8506847/llvm/include/llvm/Support/KnownBits... > > So there is indeed a lot of extension work to do, like @jatin-bhateja said. But we can use that to also refactor the code for testability. @eme64 I have created https://bugs.openjdk.org/browse/JDK-8359149 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2959794755 From epeter at openjdk.org Tue Jun 10 16:09:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Jun 2025 16:09:32 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: <7fnB8ubV2eSkP88UkrVQ6qmNZcomS5Zby6mAUukJP4Y=.c8048c62-404b-4985-a614-9637f3fd03e9@github.com> References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> <7fnB8ubV2eSkP88UkrVQ6qmNZcomS5Zby6mAUukJP4Y=.c8048c62-404b-4985-a614-9637f3fd03e9@github.com> Message-ID: On Wed, 4 Jun 2025 09:47:24 GMT, Jatin Bhateja wrote: >> @jatin-bhateja I'll wait with testing, until someone from Intel gives this the approval. Feel free to ping me for that once we are there :) > > Hi @eme64, please initiate your test runs, we can have a second review from @sviswa7 once she is online. @jatin-bhateja @sviswa7 Running testing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2959865942 From mli at openjdk.org Tue Jun 10 16:09:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Jun 2025 16:09:53 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v2] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - tests - fix BoolTest::ge/gt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/e5b06b56..45dc4949 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=00-01 Stats: 433 lines in 4 files changed: 431 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From mli at openjdk.org Tue Jun 10 16:09:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Jun 2025 16:09:53 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Mon, 9 Jun 2025 16:27:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? We need new implementation for BoolTest::ge/gt, because of NaN cases. It's done. Also add tests based on @RealFYang 's example. Thank you @RealFYang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2959865404 From mli at openjdk.org Tue Jun 10 16:13:44 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Jun 2025 16:13:44 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v3] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/45dc4949..7da631b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=01-02 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From kvn at openjdk.org Tue Jun 10 17:25:31 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Jun 2025 17:25:31 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 22:40:34 GMT, Saranya Natarajan wrote: > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 5 with) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) src/hotspot/share/opto/compile.cpp line 2540: > 2538: return; > 2539: } > 2540: mex.eliminate_opaque_looplimit_macro_nodes(); Missing `failing()` check. src/hotspot/share/opto/macro.cpp line 2480: > 2478: void PhaseMacroExpand::eliminate_opaque_looplimit_macro_nodes() { > 2479: if (C->macro_count() == 0) > 2480: return; Code style: use {}. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2138406210 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2138399110 From rcastanedalo at openjdk.org Tue Jun 10 18:42:29 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Jun 2025 18:42:29 GMT Subject: RFR: 8359126: [AIX] new test TestImplicitNullChecks.java fails In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 14:10:04 GMT, Martin Doerr wrote: > Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25728#pullrequestreview-2914705531 From rcastanedalo at openjdk.org Tue Jun 10 18:54:29 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Jun 2025 18:54:29 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> On Tue, 10 Jun 2025 10:17:11 GMT, Roland Westrelin wrote: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2960287377 From mdoerr at openjdk.org Tue Jun 10 19:21:33 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 10 Jun 2025 19:21:33 GMT Subject: RFR: 8359126: [AIX] new test TestImplicitNullChecks.java fails In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 14:10:04 GMT, Martin Doerr wrote: > Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25728#issuecomment-2960348797 From cslucas at openjdk.org Wed Jun 11 00:05:47 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Jun 2025 00:05:47 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback: add comments, refactor enum definition in the Java side. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/9478d1de..5e4b8145 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=00-01 Stats: 83 lines in 5 files changed: 76 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From cslucas at openjdk.org Wed Jun 11 00:05:47 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Jun 2025 00:05:47 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:43:07 GMT, Doug Simon wrote: >> Thank you for working on this. Now #25338 makes even more sense ? >> >> A naive question: is it possible to somehow share the enum definition in hotspot with the Java side in JVMCI? If all change reasons were enums, they would be much easier to understand. > >> A naive question: is it possible to somehow share the enum definition in hotspot with the Java side in JVMCI? If all change reasons were enums, they would be much easier to understand. > > Not directly but you can have a mirror enum in JVMCI whose ordinal and message could be kept in sync and initialized from native code. As an example, see [jdk.vm.ci.hotspot.HotSpotCompiledCodeStream.Tag](https://github.com/openjdk/jdk/blob/3ff83ec49e561c44dd99508364b8ba068274b63a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotCompiledCodeStream.java#L228). Thank you for the comments. I added the comments that you asked @dougxc and I did some refactoring around the definition of the ChangeReason enum in the JVMCI API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2960843655 From cslucas at openjdk.org Wed Jun 11 00:20:28 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Jun 2025 00:20:28 GMT Subject: RFR: 8358334: C2/Shenandoah: incorrect execution with Unsafe In-Reply-To: References: Message-ID: <7ZHTFi99uOLv_qY4m0bsX8UY9uP2J0NGpfMl6uLcsTE=.0c34fe48-5ee8-407c-b51c-1290099b6761@github.com> On Tue, 10 Jun 2025 14:13:21 GMT, Roland Westrelin wrote: > When a barrier is expanded, some control is picked as a location for > the barrier. The control input of data nodes that depend on that > control are updated so the nodes are after the expanded barrier unless > the barrier itself depends on some of those nodes. > > In this particular failure, a raw memoy `Store` is the input memory to > the barrier. That `Store` has an anti-dependent `Load`. All 3 nodes > (barrier, `Load` and `Store`) are at the same control. The `Store` is > an input to the barrier so it stays before the barrier. The `Load`'s > control is updated to be after the barrier which breaks the > anti-dependency. The bug is that the logic that sorts nodes that need > to be before the barrier and those that can be after ignores > anti-dependencies. The fix simply extends that logic to take them into > account. LGTM. Thanks. ------------- PR Review: https://git.openjdk.org/jdk/pull/25729#pullrequestreview-2915306013 From duke at openjdk.org Wed Jun 11 02:18:29 2025 From: duke at openjdk.org (erifan) Date: Wed, 11 Jun 2025 02:18:29 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: <9LpnQrYLKKpYrKwERCehMmSTXm0-2pCjF0HRCU1AKh0=.9eb49965-268b-48f3-8c79-d05d4df93e25@github.com> On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. Hi, any further comments? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2961024225 From fyang at openjdk.org Wed Jun 11 03:40:31 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Jun 2025 03:40:31 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v3] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Tue, 10 Jun 2025 16:13:44 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > typo Hi, > We need new implementation for BoolTest::ge/gt, because of NaN cases. It's done. Hmm, I don't understand why your first commit (https://github.com/openjdk/jdk/pull/25696/commits/e5b06b5621b01dbb557fc21881307c9131f4bb53) won't work for NaN cases. Do you have more details or maybe a small test case to demo your concern? I also changed my test a bit and tried NaN cases and it still works if I use your first commit (https://github.com/openjdk/jdk/pull/25696/commits/e5b06b5621b01dbb557fc21881307c9131f4bb53). public class Test { // return 1 if dl > dr, 0 otherwise. public static int test_float_gt(float dl, float dr) { return !(dl <= dr) ? 1 : 0; } // return 1 if dl <= dr, 0 otherwise. public static int test_float_ge(float dl, float dr) { return !(dl < dr) ? 1 : 0; } public static void main(String[] args) throws Exception { int ret = 0; // test case BoolTest::ge for (int i = 0; i < 20000; i++) { if ((i % 2) == 0) { ret = test_float_gt(1.0f, Float.NaN); <=============== if (ret != 1) { throw new Exception("test_float_gt failed."); } } else { ret = test_float_gt(2.0f, 1.0f); if (ret != 1) { throw new Exception("test_float_gt failed."); } } } System.out.println("test_float_gt passed."); // test case BoolTest::gt for (int i = 0; i < 20000; i++) { if ((i % 2) == 0) { ret = test_float_ge(1.0f, Float.NaN); <=============== if (ret != 1) { throw new Exception("test_float_ge failed."); } } else { ret = test_float_ge(2.0f, 1.0f); if (ret != 1) { throw new Exception("test_float_ge failed."); } } } System.out.println("test_float_ge passed."); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2961145846 From amitkumar at openjdk.org Wed Jun 11 04:43:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 11 Jun 2025 04:43:09 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size Message-ID: There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/25741/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25741&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358756 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25741.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25741/head:pull/25741 PR: https://git.openjdk.org/jdk/pull/25741 From epeter at openjdk.org Wed Jun 11 05:36:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 05:36:37 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. @erifan Thanks for the updates, I have some more comments :) src/hotspot/share/opto/subnode.hpp line 333: > 331: mask negate( ) const { return mask(_test^4); } > 332: // Return the negative mask for the given mask, for both signed and unsigned comparison. > 333: static mask negate_mask(mask btm) { return mask(btm^4); } Suggestion: static mask negate_mask(mask btm) { return mask(btm ^ 4); } https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > Use spaces around operators, especially comparisons and assignments. (Relaxable for boolean expressions and high-precedence operators in classic math-style formulas.) src/hotspot/share/opto/vectornode.cpp line 2226: > 2224: > 2225: const TypeVect* vector_mask_cast_vt = nullptr; > 2226: // in1 should be single used, otherwise the optimization may be unprofitable. Suggestion: // in1 should only have a single use, otherwise the optimization may be unprofitable. src/hotspot/share/opto/vectornode.cpp line 2227: > 2225: const TypeVect* vector_mask_cast_vt = nullptr; > 2226: // in1 should be single used, otherwise the optimization may be unprofitable. > 2227: if (in1->Opcode() == Op_VectorMaskCast && in1->outcnt() == 1 && in1->in(1)->Opcode() == Op_VectorMaskCmp) { `in1->in(1)->Opcode() == Op_VectorMaskCmp` Is this check here even necessary? Because we check it below again, right? `in1->Opcode() != Op_VectorMaskCmp` src/hotspot/share/opto/vectornode.cpp line 2237: > 2235: !VectorNode::is_all_ones_vector(in2)) { > 2236: return nullptr; > 2237: } Similarly here: do you have tests for these conditions, that we do not optimize if any of these fail? src/hotspot/share/opto/vectornode.cpp line 2239: > 2237: } > 2238: > 2239: BoolTest::mask neg_cond = BoolTest::negate_mask(((VectorMaskCmpNode*) in1)->get_predicate()); Suggestion: BoolTest::mask neg_cond = BoolTest::negate_mask((in1->as_VectorMaskCmp())->get_predicate()); Does that compile? It would be prefereable. src/hotspot/share/opto/vectornode.cpp line 2243: > 2241: const TypeVect* vt = in1->as_Vector()->vect_type(); > 2242: Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2), > 2243: predicate_node, vt); Suggestion: Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2), predicate_node, vt); Alignment test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 158: > 156: } else if (op == VectorOperators.UGT) { > 157: Asserts.assertEquals(compareUnsigned(a, b) <= 0, r); > 158: } Please refactor it as a `switch`. And add a `default` case where you throw some `RuntimeException`. just to make sure we are not missing anything :) test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 244: > 242: testCompareMaskNotByte(VectorOperators.EQ, m -> m.not()); > 243: testCompareMaskNotByte(VectorOperators.EQ, m -> m.xor(B_SPECIES.maskAll(true))); > 244: } Could it happen that the verification is inlined in the test body? Currently, the verification is probably inlined, but the code there is not vectorized. But what if one day the auto-vectorizer is smart enough and vectorizes it, and creates vectors that we currently check `count ...= 0`? At least, you could ensure that the verification does not get inlined, with `@DontInline`. What do you think? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 623: > 621: testCompareMaskNotFloat(VectorOperators.NE, fa, fninf, m -> m.not()); > 622: testCompareMaskNotFloat(VectorOperators.NE, fa, fninf, m -> m.xor(F_SPECIES.maskAll(true))); > 623: } Something makes me a little nervous about the correctness in these IR rules: You are checking `IRNode.XOR_VL, "= 0"`. But you are comparing `floats`. Does that make sense? Also: in the whole test, there is no single case where you expect the `XOR_V` to still be in the IR. I think it would be good to have one "control test" at least, where you test in a very similar pattern that this node is still there, and does not optimize. Maybe you can use a case where the construct has multiple uses, and is therefore not profitable to be optimized. What do you think? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 692: > 690: TestFramework testFramework = new TestFramework(); > 691: testFramework.addFlags("--add-modules=jdk.incubator.vector"); > 692: testFramework.setDefaultWarmup(10000); The default is `2000` is that not enough? Increasing it means the test runs slower, here probably about 5x. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2915634768 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139189790 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139199315 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139201553 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139206813 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139216182 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139217776 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139227321 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139234183 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139239614 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139243617 From epeter at openjdk.org Wed Jun 11 05:36:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 05:36:37 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 05:08:35 GMT, Emanuel Peter wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/vectornode.cpp line 2227: > >> 2225: const TypeVect* vector_mask_cast_vt = nullptr; >> 2226: // in1 should be single used, otherwise the optimization may be unprofitable. >> 2227: if (in1->Opcode() == Op_VectorMaskCast && in1->outcnt() == 1 && in1->in(1)->Opcode() == Op_VectorMaskCmp) { > > `in1->in(1)->Opcode() == Op_VectorMaskCmp` > Is this check here even necessary? Because we check it below again, right? > `in1->Opcode() != Op_VectorMaskCmp` Btw: do you have a test where `in1->outcnt() > 1`, and you check that the optimization does not happen with an IR test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139203141 From dfenacci at openjdk.org Wed Jun 11 06:23:28 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 11 Jun 2025 06:23:28 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 10 Jun 2025 08:41:16 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - move the changes in flag constraints specific file Running Tier1-3+ tests ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2961391686 From mhaessig at openjdk.org Wed Jun 11 07:12:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 11 Jun 2025 07:12:15 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue Message-ID: # Issue Summary When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: class A { static class B { static String field; static void test() { String tmp = field; new C(field); } } static class C { static { B.field = "Hello"; } C(String val) { if (val == null) { throw new RuntimeException("Should not reach here"); } } } } Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. # Changes To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. # Benchmark Results Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. # Testing - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes # Acknowledgements Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. ------------- Commit messages: - C1: kill memory for new instance if class has not been initialized - Add ciInstanceKlass::has_class_initializer - Add regression test Changes: https://git.openjdk.org/jdk/pull/25725/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25725&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357782 Stats: 84 lines in 4 files changed: 82 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25725/head:pull/25725 PR: https://git.openjdk.org/jdk/pull/25725 From duke at openjdk.org Wed Jun 11 07:34:33 2025 From: duke at openjdk.org (erifan) Date: Wed, 11 Jun 2025 07:34:33 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 05:23:12 GMT, Emanuel Peter wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 158: > >> 156: } else if (op == VectorOperators.UGT) { >> 157: Asserts.assertEquals(compareUnsigned(a, b) <= 0, r); >> 158: } > > Please refactor it as a `switch`. And add a `default` case where you throw some `RuntimeException`. just to make sure we are not missing anything :) `VectorOperators.XXX` is not compile time constants, we can't use `switch` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139418759 From epeter at openjdk.org Wed Jun 11 07:46:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 07:46:31 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 07:31:48 GMT, erifan wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 158: >> >>> 156: } else if (op == VectorOperators.UGT) { >>> 157: Asserts.assertEquals(compareUnsigned(a, b) <= 0, r); >>> 158: } >> >> Please refactor it as a `switch`. And add a `default` case where you throw some `RuntimeException`. just to make sure we are not missing anything :) > > `VectorOperators.XXX` is not compile time constants, we can't use `switch` here. Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I would suggest :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139441633 From dnsimon at openjdk.org Wed Jun 11 07:53:29 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Jun 2025 07:53:29 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 00:05:47 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: add comments, refactor enum definition in the Java side. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/InstalledCode.java line 152: > 150: */ > 151: public void invalidate() { > 152: invalidate(true, 0); This assigns `ChangeReason::C1_codepatch` to JVMCI invalidations which does not seem right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2139454968 From dbriemann at openjdk.org Wed Jun 11 08:09:29 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 11 Jun 2025 08:09:29 GMT Subject: RFR: 8359126: [AIX] new test TestImplicitNullChecks.java fails In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 14:10:04 GMT, Martin Doerr wrote: > Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. Marked as reviewed by dbriemann (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/25728#pullrequestreview-2916086149 From dnsimon at openjdk.org Wed Jun 11 08:20:28 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Jun 2025 08:20:28 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 00:05:47 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: add comments, refactor enum definition in the Java side. While the new ChangeReason JVMCI enum looks nice, I don't quite get how/where it is (or should be) used? It seems disconnected from the `InstalledCode.changeReason` field. In general, I don't find the name ChangeReason quite models the concept properly - wouldn't "InvalidationReason" be more accurate? "change" is a very broad concept. I think all JVMCI Java level tracking of change (or invalidation) reasons should be confined to `HotSpotNmethod` as it doesn't make much sense in the other `InstalledCode` (sub)classes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2961675374 PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2961676448 PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2961682020 From duke at openjdk.org Wed Jun 11 08:22:31 2025 From: duke at openjdk.org (erifan) Date: Wed, 11 Jun 2025 08:22:31 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 05:31:00 GMT, Emanuel Peter wrote: > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does that make sense? The bottom types of `float` and `double` vector masks are casted to `int` and `long`. Seems this is by design? So this is correct. As for `control test`, yes for now I didn't add any such kind of test, because I personally think it's unnecessary. For specific code, it won't trigger this optimization now, but new optimizations in the future may cause this test to fail. Anyway I've tested the case where `vectormaskcmp` is multi used locally, and this optimization won't be triggered. Do you think it's necessary to add such a control test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139512969 From mdoerr at openjdk.org Wed Jun 11 08:31:40 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Jun 2025 08:31:40 GMT Subject: RFR: 8359126: [AIX] new test TestImplicitNullChecks.java fails In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 14:10:04 GMT, Martin Doerr wrote: > Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25728#issuecomment-2961707666 From mdoerr at openjdk.org Wed Jun 11 08:31:41 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Jun 2025 08:31:41 GMT Subject: Integrated: 8359126: [AIX] new test TestImplicitNullChecks.java fails In-Reply-To: References: Message-ID: <7oZA7xU1gMql5X9gjvBMC2PGgdHdvO4sdbjbBikx580=.141f40c2-e744-480c-bef7-559a06056f27@github.com> On Tue, 10 Jun 2025 14:10:04 GMT, Martin Doerr wrote: > Switch off the IR rule which expects no explicit null check for a load operation for AIX. I had to enhance the IR code. This pull request has now been integrated. Changeset: abc76c6b Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/abc76c6b5b3e2eabd3fd3ceb96ffe02979dc8956 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod 8359126: [AIX] new test TestImplicitNullChecks.java fails Reviewed-by: rcastanedalo, dbriemann ------------- PR: https://git.openjdk.org/jdk/pull/25728 From epeter at openjdk.org Wed Jun 11 08:33:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 08:33:32 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 08:20:20 GMT, erifan wrote: > > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does that make sense? > The bottom types of float and double vector masks are casted to int and long. Seems this is by design? So this is correct. This is a `float` test. What is the bottom type for the mask here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139534933 From mli at openjdk.org Wed Jun 11 08:34:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Jun 2025 08:34:30 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v3] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Tue, 10 Jun 2025 16:13:44 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > typo > Hi, > > > We need new implementation for BoolTest::ge/gt, because of NaN cases. It's done. > > Hmm, I don't understand why your first commit ([e5b06b5](https://github.com/openjdk/jdk/commit/e5b06b5621b01dbb557fc21881307c9131f4bb53)) won't work for NaN cases. I think the reason is the original assumption is not right about the behaviour of cmov_cmp_fp_ge/gt. > Do you have more details or maybe a small test case to demo your concern? You can see failures when running new tests if revert back to first commit of implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2961723788 From duke at openjdk.org Wed Jun 11 08:39:29 2025 From: duke at openjdk.org (duke) Date: Wed, 11 Jun 2025 08:39:29 GMT Subject: RFR: 8356751: IGV: clean up redundant field _should_send_method In-Reply-To: References: Message-ID: <-90Q6B4SfXB9PbOMiBV8oY8J5TgBUcsOLYupXbZZ6F4=.82005755-c66d-469b-a833-dfa7a9e87810@github.com> On Tue, 10 Jun 2025 08:58:15 GMT, Beno?t Maillard wrote: > This PR removes private field `IdealGraphPrinter::_should_send_method` (as it was only ever assigned `true`) and all usages in conditions. > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356751) > - [x] tier1, and some Oracle internal testing > - [x] Ran IdealGraphVisualizer and made sure that no crash occured > > Shout out to @mhaessig for introducing me to IGV > > Thanks! @benoitmaillard Your change (at version 2a1ac442211befbbc15b0849a54f4804704a9bbd) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25714#issuecomment-2961738947 From xgong at openjdk.org Wed Jun 11 08:56:39 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 11 Jun 2025 08:56:39 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 10:38:11 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request incrementally with one additional commit since the last revision: > > Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. src/hotspot/share/opto/vectornode.cpp line 2221: > 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. > 2220: if (in2->Opcode() == Op_VectorMaskCmp || > 2221: (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == Op_VectorMaskCmp)) { We may need to consider cases that a `VectorMaskCast` is generated between `compare + not`, such as `compare + cast + not`. For such cases, the element size maybe different for input and output of a `cast`. Although this patch's intention is not for the latter pattern, current change have also covered it well. Could you please add more test/jmh for all kinds of `cast` pattern here? And I think the scope of this PR could be also extended to `compare + cast + not`. WDYT? src/hotspot/share/opto/vectornode.cpp line 2237: > 2235: !VectorNode::is_all_ones_vector(in2)) { > 2236: return nullptr; > 2237: } This part can be refined more clearly: // Swap and put all_ones_vector to right if (!VectorNode::is_all_ones_vector(in1)) { swap(in1, in2); } // uncast mask bool need_cast = false; if (in1->Opcode() == Op_VectorMaskCast && in1->outcnt() == 1) { assert(in1->bottom_type()->eq(bottom_type()), ""); need_cast = true; in1 = in1->in(1); } // Check mask cmp pattern if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || !in1->as_VectorMaskCmp()->predicate_can_be_negated()) { return nullptr; } // Convert VectorMaskCmp + not // Cast back if (need_cast) { res = new VectorMaskCastNode(phase->transform(res), vect_type()); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139568985 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139578186 From duke at openjdk.org Wed Jun 11 08:56:40 2025 From: duke at openjdk.org (erifan) Date: Wed, 11 Jun 2025 08:56:40 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: <9EOB4uZ1q0v_9xw_gUslYls90Np2WK7K6cOgJ8KKYBQ=.91ec53d2-912f-4832-8570-7edf5c50b65f@github.com> On Wed, 11 Jun 2025 08:30:51 GMT, Emanuel Peter wrote: >>> You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does that make sense? >> >> The bottom types of `float` and `double` vector masks are casted to `int` and `long`. Seems this is by design? So this is correct. >> >> As for `control test`, yes for now I didn't add any such kind of test, because I personally think it's unnecessary. For specific code, it won't trigger this optimization now, but new optimizations in the future may cause this test to fail. >> >> Anyway I've tested the case where `vectormaskcmp` is multi used locally, and this optimization won't be triggered. >> >> Do you think it's necessary to add such a control test? > >> > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does that make sense? > >> The bottom types of float and double vector masks are casted to int and long. Seems this is by design? So this is correct. > > This is a `float` test. What is the bottom type for the mask here? Oh, this is a stupid copy-paste mistake. Good catch, thanks! I'll double check them all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139582391 From mli at openjdk.org Wed Jun 11 09:09:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Jun 2025 09:09:25 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v4] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: debug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/7da631b8..73e81209 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=02-03 Stats: 21 lines in 1 file changed: 1 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From duke at openjdk.org Wed Jun 11 09:09:33 2025 From: duke at openjdk.org (erifan) Date: Wed, 11 Jun 2025 09:09:33 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: <-Njau47iLgFUOIQEnZDrkzZKIjPEk3ErFIenrs6AelM=.2624ea9a-6b6c-4a07-85e4-2fa7334754dd@github.com> On Wed, 11 Jun 2025 07:43:55 GMT, Emanuel Peter wrote: >> `VectorOperators.XXX` is not compile time constants, we can't use `switch` here. > > Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I would suggest :) Make sense! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139610469 From duke at openjdk.org Wed Jun 11 09:12:33 2025 From: duke at openjdk.org (erifan) Date: Wed, 11 Jun 2025 09:12:33 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 08:47:56 GMT, Xiaohong Gong wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/vectornode.cpp line 2221: > >> 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. >> 2220: if (in2->Opcode() == Op_VectorMaskCmp || >> 2221: (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == Op_VectorMaskCmp)) { > > We may need to consider cases that a `VectorMaskCast` is generated between `compare + not`, such as `compare + cast + not`. For such cases, the element size maybe different for input and output of a `cast`. Although this patch's intention is not for the latter pattern, current change have also covered it well. Could you please add more test/jmh for all kinds of `cast` pattern here? And I think the scope of this PR could be also extended to `compare + cast + not`. WDYT? Good catch, I'll add more tests and check the correctness. Thanks~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2139615887 From mli at openjdk.org Wed Jun 11 09:23:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Jun 2025 09:23:20 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v5] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: golden values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/73e81209..423cf1cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=03-04 Stats: 40 lines in 1 file changed: 32 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From snatarajan at openjdk.org Wed Jun 11 09:48:19 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 11 Jun 2025 09:48:19 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: References: Message-ID: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review on code style and adding failing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25682/files - new: https://git.openjdk.org/jdk/pull/25682/files/0a8d1375..aacb4245 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=00-01 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From jbhateja at openjdk.org Wed Jun 11 10:22:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Jun 2025 10:22:13 GMT Subject: RFR: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input [v2] In-Reply-To: References: Message-ID: > Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Problem occues during IGVN cleanups after partial peeling of loop. > > Test mentioned in the bug report has been included along with the patch. > > Kindly review. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25586/files - new: https://git.openjdk.org/jdk/pull/25586/files/abf83f84..821be711 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25586&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25586&range=00-01 Stats: 201 lines in 3 files changed: 129 ins; 70 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25586/head:pull/25586 PR: https://git.openjdk.org/jdk/pull/25586 From jbhateja at openjdk.org Wed Jun 11 10:22:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Jun 2025 10:22:13 GMT Subject: RFR: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:53:23 GMT, Jatin Bhateja wrote: > Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Problem occues during IGVN cleanups after partial peeling of loop. > > Test mentioned in the bug report has been included along with the patch. > > Kindly review. > > Best Regards, > Jatin Root Cause: Problem occurs during IGVN cleanup after partial peeling. Partial peeling rotates the loop by bringing out the peel section and creating a new loop which begins with the non-peel section, followed by the peel section loop back. To perform this translation, the compiler begins by cloning the original loop, brings the peel section into the new loop header, re-wires the new header to point to the start of the non-peel block (cut-point) of new loop and then stitches peel section of the cloned loop after non-peel section thereby rotating the original loop. Since the peel-section is the only usable part of the cloned loop hence all remaining part of the loop is swiped out by GVN cleanup. In this case, during cleanups when the control reaches the ExpandBits/CompressBits idealization, it hits upon an unsafe use (is_* call) of mask input, which was tied to TOP node and results into an assertion failure, to fix the problem this PR adds safe isa_* call before unsafe accesses. With default options, the problem only occurs with Long Expand/CompressBits because for integer variants, nodes get picked up in a different order from the IGVN worklist; we can use -XX:+StressIGVN to reproduce issues with integral variants. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25586#issuecomment-2961931670 From jbhateja at openjdk.org Wed Jun 11 10:28:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Jun 2025 10:28:31 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> <7fnB8ubV2eSkP88UkrVQ6qmNZcomS5Zby6mAUukJP4Y=.c8048c62-404b-4985-a614-9637f3fd03e9@github.com> Message-ID: On Tue, 10 Jun 2025 16:07:16 GMT, Emanuel Peter wrote: >> Hi @eme64, please initiate your test runs, we can have a second review from @sviswa7 once she is online. > > @jatin-bhateja @sviswa7 Running testing now. Hi @eme64 , let us know if this is good to land. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2962110015 From duke at openjdk.org Wed Jun 11 11:11:37 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 11 Jun 2025 11:11:37 GMT Subject: Integrated: 8356751: IGV: clean up redundant field _should_send_method In-Reply-To: References: Message-ID: <6zQMNc17zKfjrHF_UQIvXF1VFc2L3EokucSkoE4QBWc=.b3b8fd7b-5928-4799-8cb1-8408c30c4a89@github.com> On Tue, 10 Jun 2025 08:58:15 GMT, Beno?t Maillard wrote: > This PR removes private field `IdealGraphPrinter::_should_send_method` (as it was only ever assigned `true`) and all usages in conditions. > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356751) > - [x] tier1, and some Oracle internal testing > - [x] Ran IdealGraphVisualizer and made sure that no crash occured > > Shout out to @mhaessig for introducing me to IGV > > Thanks! This pull request has now been integrated. Changeset: bf7d40d0 Author: Beno?t Maillard Committer: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/bf7d40d0486b7b4e4820bb5d08a63c446ea3291d Stats: 11 lines in 2 files changed: 0 ins; 5 del; 6 mod 8356751: IGV: clean up redundant field _should_send_method Co-authored-by: Manuel H?ssig Reviewed-by: mhaessig, thartmann, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/25714 From rcastanedalo at openjdk.org Wed Jun 11 11:14:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Jun 2025 11:14:31 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v3] In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Tue, 10 Jun 2025 14:59:05 GMT, Roland Westrelin wrote: > Thanks for the reports but I can't reproduce any of them. Are there extra options that need to be passed to the JVM? You are right, here is a more detailed report: make run-test CONF=linux-x64-debug TEST="compiler/predicates/assertion/TestOpaqueInitializedAssertionPredicateNode.java" TEST_VM_OPTS="-XX:StressLongCountedLoop=2000000" make run-test CONF=linux-x64-debug TEST="compiler/predicates/TestHoistedPredicateForNonRangeCheck.java" TEST_VM_OPTS="-XX:-TieredCompilation -XX:VerifyIterativeGVN=10" make run-test CONF=linux-x64-debug TEST="compiler/loopconditionalpropagation/TestLoopConditionalPropagation.java" TEST_VM_OPTS="-XX:-TieredCompilation" make run-test CONF=linux-x64-debug TEST="compiler/codegen/aes/TestCipherBlockChainingEncrypt.java" TEST_VM_OPTS="-XX:-TieredCompilation -XX:VerifyIterativeGVN=10" make run-test CONF=linux-x64-debug TEST="compiler/predicates/TestHoistedPredicateForNonRangeCheck.java" TEST_VM_OPTS="-XX:-TieredCompilation -XX:VerifyIterativeGVN=10" Hope that helps! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2962274522 From rcastanedalo at openjdk.org Wed Jun 11 11:33:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Jun 2025 11:33:33 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Tue, 10 Jun 2025 18:51:49 GMT, Roberto Casta?eda Lozano wrote: > Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. Test results (tier1-5 in Oracle's internal test system) look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2962326109 From epeter at openjdk.org Wed Jun 11 12:09:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 12:09:33 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v8] In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 07:46:32 GMT, Jatin Bhateja wrote: >> A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. >> >> >> compiler/intrinsics/bmi/verifycode/AndnTestI.java >> compiler/intrinsics/bmi/verifycode/AndnTestL.java >> compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java >> compiler/intrinsics/bmi/verifycode/LZcntTestL.java >> compiler/intrinsics/bmi/verifycode/TZcntTestL.java >> >> >> B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. >> >> Above tests are now passing, validations were carried out using Intel Software Development emulator. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resoltions @jatin-bhateja Testing passed :) I'll leave the responsibility to you to test on all sorts of x86 architectures, especially APX. I see you said that you tested this with SDE in the PR description, please make sure you have the latest changes tested too before integration. Thanks for the work @jatin-bhateja ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25501#pullrequestreview-2916866897 From epeter at openjdk.org Wed Jun 11 12:16:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 12:16:33 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Mon, 9 Jun 2025 07:15:46 GMT, Kuai Wei wrote: >> @kuaiwei Thanks for your reply! >> >>> I think it's more easy to mark the combine operator checked. >> >> It may seem easier now. But over time, if multiple operations had such flags, things would become very messy. And now every node that can be such a `combine operator` has to have an additional flag, and consumes more memory. >> >>> I tried to use match pattern for MergePrimitiveLoads::has_no_merge_load_combine_below() . But I think it has some difficulty. For mergeable operators, they can be linked in different way, like: >>> (((item1 Or item2) Or item3) Or item4) >>> ((item1 Or item2) Or (item3 Or item4)) >>> ... >> >> Yes, we may have to deal with inputs being permuted. But I think we should be able to deal with the permutations, we do that in other places too. >> >>> To check the next Or operator is a valid last one of combine operator chain. We may check its all input recursively. I didn't find a good way to revolve it. If you have better idea, I will check it. >> >> I'm not sure I understood what you said here. >> >>> We may check its all input recursively >> >> You probably mean we could check all outputs? >> >> So if you are looking at the `OrINode`, and the pattern above it is already a `MergeLoad` pattern, then we should also look down, and see if we find other `OrINode`. For each of these output nodes, we should check if their other input could also be merged with what we already have. Do you not think this is possible? What exactly makes it difficult or impossible? > > @eme64 From your description, I may change like below. Could you check if I understand correct? Thanks. > When IGVN check the input combine operator, called it as `_checked`. We can go down the combine operators chain to find the _last one. > > for op from _last to _checked: // _checked is not include > collect merge_info_list by op > if it can be merged and _checked is in the list: > return // it will be merged when IGVN optimize this op > if can not merge or _checked is not in list: > continue; > // all successors of combine operators are checked, we can start to merge with _checked > ... > > I think it can work but there are some redundant `collect and check` work. And we can add a cache in IGVN to reduce it. Do you have other suggestion about it ? @kuaiwei I'm struggling to follow your pseudocode, can you please expand a little or try to describe in other words? It is good to reduce redundant work, but it has to be worth the complexity. But I'm skeptical of caching things between IGVN optimizations, it is just not something we do, as far as I know. Caching could also be tricky when the cached things are not accurate any more. What exactly would be your approach with caching? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2962451732 From dbriemann at openjdk.org Wed Jun 11 12:35:03 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 11 Jun 2025 12:35:03 GMT Subject: RFR: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments Message-ID: Add missing sizes for some instructions. Clean up outdated Power7 comments. ------------- Commit messages: - 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments Changes: https://git.openjdk.org/jdk/pull/25752/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25752&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359232 Stats: 30 lines in 1 file changed: 10 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25752/head:pull/25752 PR: https://git.openjdk.org/jdk/pull/25752 From dbriemann at openjdk.org Wed Jun 11 12:40:05 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 11 Jun 2025 12:40:05 GMT Subject: RFR: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments [v2] In-Reply-To: References: Message-ID: > Add missing sizes for some instructions. > Clean up outdated Power7 comments. David Briemann has updated the pull request incrementally with one additional commit since the last revision: re-add deleted token ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25752/files - new: https://git.openjdk.org/jdk/pull/25752/files/b30892be..315a7da5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25752&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25752&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25752/head:pull/25752 PR: https://git.openjdk.org/jdk/pull/25752 From mli at openjdk.org Wed Jun 11 12:50:14 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Jun 2025 12:50:14 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v6] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: <1oKHolQiymAnc5OKC1RcR9fDlL9c_F0zW6gHJ3pQKWI=.6e5f69a6-f474-415d-8398-e5e2952d985e@github.com> > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: adjust arguments orders ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/423cf1cc..9848cc28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=04-05 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From dfenacci at openjdk.org Wed Jun 11 12:52:30 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 11 Jun 2025 12:52:30 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 04:33:30 GMT, Amit Kumar wrote: > There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. Thanks @offamitkumar. The idea behind the [PR](https://github.com/openjdk/jdk/pull/23630) that changed this is that it would check randomly around the amount of code cache that would be just enough for the compilers to start (or not). So, before that PR it would sometimes crash instead of terminating gently. Does adding `800k` to the initial code cache for s390 do that? Did you try before that [PR](https://github.com/openjdk/jdk/pull/23630) (or temporarily reverting it) to see if it crashes? test/hotspot/jtreg/compiler/startup/StartupOutput.java line 66: > 64: Process[] pr = new Process[200]; > 65: for (int i = 0; i < 200; i++) { > 66: int initialCodeCacheSizeInKb = 800 + rand.nextInt(400) + (Platform.isS390x() ? 800 : 0); It is just a stylistic issue but I'd rather adapt the fist constant (`800`) depending on the platform with something like: `int initialCodeCacheSizeInKb = minInitialSize + rand.nextInt(400);` and `minInitialSize` set depending on the platform before the loop. ------------- PR Review: https://git.openjdk.org/jdk/pull/25741#pullrequestreview-2916977944 PR Review Comment: https://git.openjdk.org/jdk/pull/25741#discussion_r2140032847 From mchevalier at openjdk.org Wed Jun 11 13:02:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 11 Jun 2025 13:02:42 GMT Subject: RFR: 8359121: C2: Region added by vectorizedMismatch intrinsic can survive as a dead node after IGVN Message-ID: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> In `bool LibraryCallKit::inline_vectorizedMismatch()` the region created at: https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/library_call.cpp#L6502 may have only one input, and be a copy (that is no self-loop) and a single input. It is thus safe to remove. Yet, in the reproducer case, the node is short-circuited, but stays in the graph after IGVN. Left, after Parsing/before IGVN; right, after IGVN: On the left, the ? is there because the Region doesn't have a self loop, which is expected for copies. On the right, it still doesn't have a self-loop, but IGV is also complaining the Region has no successor. This transformation comes from `IfNode::Ideal`, that calls `IfNode::Ideal_common`, that calls `Node::remove_dead_region`, that shortcuts a trivial Region input: https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/node.cpp#L1480-L1484 Yet, the Region node is never enqueued for IGVN, so stays in the graph. This is not nice because it gives both: - a control node (50 Proj) has 2 successors, - a control node (46 Region) has 0 successors. While at the end of IGVN, we could expect the graph to be cleaned up. There are couple ways of doing, each being enough by itself: 1. explicitly record for IGVN the region node in `LibraryCallKit::inline_vectorizedMismatch`, and not hoping it would be collected by another consequence 2. change `Node::remove_dead_region` to use `set_req_X` instead of `set_req`, so that if the Region goes dead, it will be removed. 3. not introduce the region node in `LibraryCallKit::inline_vectorizedMismatch` if we are going to have only one path, and thus avoid the problem entirely. The solution 3. is really not easy and would require quite some code restructuring for simply saving removing a node. After discussing with @chhagedorn, we concluded that the solution 1. was probably the best: - usually, functions similar to `inline_vectorizedMismatch` call `record_for_igvn` themselves, - unlike what I assumed at first seeing `remove_dead_region`, it's not a general problem at all: I couldn't find another case without using `inline_vectorizedMismatch` where the Region is put aside, and not entirely disconnected quickly after, but the Region is always processed in the same IGVN when `remove_dead_region` makes it dead. And then, we find that: This was found because it makes a check of JDK-8350864 to fail. My plan is to add this structural invariant check to the test once the flag is integrated, as for now, the test need manual inspection to see a difference. It's also not really possible to write an IR test with it: the Region node is not reachable by under (use to def traversal from Root), so the printout of the graph doesn't show the Region node, even if the `50 Proj` above is indeed printed to have 2 outputs, only the `57 If` is actually printed. ------------- Commit messages: - Add test - Record region and co for IGVN in inline_vectorizedMismatch Changes: https://git.openjdk.org/jdk/pull/25749/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25749&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359121 Stats: 58 lines in 2 files changed: 58 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25749/head:pull/25749 PR: https://git.openjdk.org/jdk/pull/25749 From thartmann at openjdk.org Wed Jun 11 13:27:30 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Jun 2025 13:27:30 GMT Subject: RFR: 8359121: C2: Region added by vectorizedMismatch intrinsic can survive as a dead node after IGVN In-Reply-To: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> References: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> Message-ID: On Wed, 11 Jun 2025 11:35:28 GMT, Marc Chevalier wrote: > In `bool LibraryCallKit::inline_vectorizedMismatch()` the region created at: > > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/library_call.cpp#L6502 > > may have only one input, and be a copy (that is no self-loop) and a single input. It is thus safe to remove. Yet, in the reproducer case, the node is short-circuited, but stays in the graph after IGVN. > > Left, after Parsing/before IGVN; right, after IGVN: > > On the left, the ? is there because the Region doesn't have a self loop, which is expected for copies. On the right, it still doesn't have a self-loop, but IGV is also complaining the Region has no successor. > > This transformation comes from `IfNode::Ideal`, that calls `IfNode::Ideal_common`, that calls `Node::remove_dead_region`, that shortcuts a trivial Region input: > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/node.cpp#L1480-L1484 > > Yet, the Region node is never enqueued for IGVN, so stays in the graph. This is not nice because it gives both: > - a control node (50 Proj) has 2 successors, > - a control node (46 Region) has 0 successors. > > While at the end of IGVN, we could expect the graph to be cleaned up. There are couple ways of doing, each being enough by itself: > 1. explicitly record for IGVN the region node in `LibraryCallKit::inline_vectorizedMismatch`, and not hoping it would be collected by another consequence > 2. change `Node::remove_dead_region` to use `set_req_X` instead of `set_req`, so that if the Region goes dead, it will be removed. > 3. not introduce the region node in `LibraryCallKit::inline_vectorizedMismatch` if we are going to have only one path, and thus avoid the problem entirely. > > The solution 3. is really not easy and would require quite some code restructuring for simply saving removing a node. After discussing with @chhagedorn, we concluded that the solution 1. was probably the best: > - usually, functions similar to `inline_vectorizedMismatch` call `record_for_igvn` themselves, > - unlike what I assumed at first seeing `remove_dead_region`, it's not a general problem at all: I couldn't find another case without using `inline_vectorizedMismatch` where the Region is put aside, and not entirely disconnected quickly after, but the Regi... Nice analysis Marc. I also prefer solution (1) and the fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25749#pullrequestreview-2917146779 From mdoerr at openjdk.org Wed Jun 11 13:29:29 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Jun 2025 13:29:29 GMT Subject: RFR: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 12:40:05 GMT, David Briemann wrote: >> Add missing sizes for some instructions. >> Clean up outdated Power7 comments. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > re-add deleted token LGTM. Thanks for cleaning this up! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25752#pullrequestreview-2917156712 From chagedorn at openjdk.org Wed Jun 11 13:37:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Jun 2025 13:37:29 GMT Subject: RFR: 8359121: C2: Region added by vectorizedMismatch intrinsic can survive as a dead node after IGVN In-Reply-To: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> References: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> Message-ID: On Wed, 11 Jun 2025 11:35:28 GMT, Marc Chevalier wrote: > In `bool LibraryCallKit::inline_vectorizedMismatch()` the region created at: > > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/library_call.cpp#L6502 > > may have only one input, and be a copy (that is no self-loop) and a single input. It is thus safe to remove. Yet, in the reproducer case, the node is short-circuited, but stays in the graph after IGVN. > > Left, after Parsing/before IGVN; right, after IGVN: > > On the left, the ? is there because the Region doesn't have a self loop, which is expected for copies. On the right, it still doesn't have a self-loop, but IGV is also complaining the Region has no successor. > > This transformation comes from `IfNode::Ideal`, that calls `IfNode::Ideal_common`, that calls `Node::remove_dead_region`, that shortcuts a trivial Region input: > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/node.cpp#L1480-L1484 > > Yet, the Region node is never enqueued for IGVN, so stays in the graph. This is not nice because it gives both: > - a control node (50 Proj) has 2 successors, > - a control node (46 Region) has 0 successors. > > While at the end of IGVN, we could expect the graph to be cleaned up. There are couple ways of doing, each being enough by itself: > 1. explicitly record for IGVN the region node in `LibraryCallKit::inline_vectorizedMismatch`, and not hoping it would be collected by another consequence > 2. change `Node::remove_dead_region` to use `set_req_X` instead of `set_req`, so that if the Region goes dead, it will be removed. > 3. not introduce the region node in `LibraryCallKit::inline_vectorizedMismatch` if we are going to have only one path, and thus avoid the problem entirely. > > The solution 3. is really not easy and would require quite some code restructuring for simply saving removing a node. After discussing with @chhagedorn, we concluded that the solution 1. was probably the best: > - usually, functions similar to `inline_vectorizedMismatch` call `record_for_igvn` themselves, > - unlike what I assumed at first seeing `remove_dead_region`, it's not a general problem at all: I couldn't find another case without using `inline_vectorizedMismatch` where the Region is put aside, and not entirely disconnected quickly after, but the Regi... Nice analysis and summary! As you've already mentioned in the description, we found solution 1 to be the best fit. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25749#pullrequestreview-2917191481 From mdoerr at openjdk.org Wed Jun 11 13:37:30 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Jun 2025 13:37:30 GMT Subject: RFR: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 12:40:05 GMT, David Briemann wrote: >> Add missing sizes for some instructions. >> Clean up outdated Power7 comments. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > re-add deleted token I think we can treat it as trivial if nobody else has time for a review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25752#issuecomment-2962739818 From jbhateja at openjdk.org Wed Jun 11 13:47:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Jun 2025 13:47:50 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: removing deoptimization for golden result computation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/18fb6dcb..f0f5998e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=06-07 Stats: 50 lines in 1 file changed: 19 ins; 16 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Wed Jun 11 13:51:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Jun 2025 13:51:34 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX [v2] In-Reply-To: References: <8mE0O0QjyMJMK7UWtfMiFc5ZjIxFYqVNUeu0qYbzaz8=.75e13abf-a2c9-407b-898d-1174a85a06cf@github.com> <7fnB8ubV2eSkP88UkrVQ6qmNZcomS5Zby6mAUukJP4Y=.c8048c62-404b-4985-a614-9637f3fd03e9@github.com> Message-ID: On Tue, 10 Jun 2025 16:07:16 GMT, Emanuel Peter wrote: >> Hi @eme64, please initiate your test runs, we can have a second review from @sviswa7 once she is online. > > @jatin-bhateja @sviswa7 Running testing now. Thanks @eme64 and @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2962830224 From jbhateja at openjdk.org Wed Jun 11 13:51:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 11 Jun 2025 13:51:35 GMT Subject: Integrated: 8357982: Fix several failing BMI tests with -XX:+UseAPX In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:21:56 GMT, Jatin Bhateja wrote: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: c98dffa1 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/c98dffa186d48c41e76fd3a60e0129a8da60310f Stats: 122 lines in 9 files changed: 108 ins; 0 del; 14 mod 8357982: Fix several failing BMI tests with -XX:+UseAPX Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25501 From epeter at openjdk.org Wed Jun 11 14:02:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 14:02:28 GMT Subject: RFR: 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 09:32:46 GMT, Jatin Bhateja wrote: >> Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Problem occues during IGVN cleanups after partial peeling of loop. >> >> Test mentioned in the bug report has been included along with the patch. >> >> Kindly review. >> >> Best Regards, >> Jatin > > Root Cause: Problem occurs during IGVN cleanup after partial peeling. > > Partial peeling rotates the loop by bringing out the peel section and creating a new loop which begins with the non-peel section, followed by the peel section loop back. > > To perform this translation, the compiler begins by cloning the original loop, brings the peel section into the new loop header, re-wires the new header to point to the start of the non-peel block (cut-point) of new loop and then stitches peel section of the cloned loop after non-peel section thereby rotating the original loop. Since the peel-section is the only usable part of the cloned loop hence all remaining part of the loop is swiped out by GVN cleanup. > > In this case, during cleanups when the control reaches the ExpandBits/CompressBits idealization, it hits upon an unsafe use (is_* call) of mask input, which was tied to TOP node and results into an assertion failure, to fix the problem this PR adds safe isa_* call before unsafe accesses. > > With default options, the problem only occurs with Long Expand/CompressBits because for integer variants, nodes get picked up in a different order from the IGVN worklist; we can use -XX:+StressIGVN to reproduce the issue with integral variants. @jatin-bhateja Thanks for adding me as a contributor! I'll run some testing now. Could you change the title to also include `compress`? Suggestion: `C2: handle TOP in Expand/CompressBitsNode::Ideal` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25586#issuecomment-2962924737 From epeter at openjdk.org Wed Jun 11 14:18:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 14:18:36 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> Message-ID: On Wed, 11 Jun 2025 13:47:50 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > removing deoptimization for golden result computation Thanks for the improvements @jatin-bhateja nice progress :) src/hotspot/share/opto/convertnode.cpp line 303: > 301: // c. The pattern being matched includes a Float to Float16 conversion after binary > 302: // expression, this downcast will still preserve the significand bits of binary32 NaN. > 303: bool isnan = g_isnan((jdouble)con); Suggestion: bool isnan = g_isnan((jdouble)con); ------------- PR Review: https://git.openjdk.org/jdk/pull/24179#pullrequestreview-2917388125 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2140288522 From epeter at openjdk.org Wed Jun 11 14:18:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 14:18:40 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <_IdYz769mq7-kTO802umUJX7Bmaz3Ds4GWLb75lAW8I=.0394a525-2288-407e-9201-7fb6b5f92353@github.com> Message-ID: On Wed, 11 Jun 2025 14:11:18 GMT, Emanuel Peter wrote: >> Right ok. The wording `enforce a pattern match` still does not make sense to me. You can `perform` a `pattern match`, but what does it mean to `enforce` it? Can you rephrase please? > > Suggestion: > Suggestion: > > // 1. conF must lie within Float16 value range, otherwise we would have rounding issues: > // Doing the operation in float32 and then rounding is not the same as > // rounding first and doing the operation in float16. Do you have tests where the constant is in float16? Which one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2140312169 From epeter at openjdk.org Wed Jun 11 14:18:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 14:18:40 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <_IdYz769mq7-kTO802umUJX7Bmaz3Ds4GWLb75lAW8I=.0394a525-2288-407e-9201-7fb6b5f92353@github.com> Message-ID: On Wed, 11 Jun 2025 14:05:34 GMT, Emanuel Peter wrote: >> import jdk.incubator.vector.*; >> public class verify_rounding { >> public static void check() { >> for (int i = 0; i < 65550; i++) { >> short post_rounding = Float.floatToFloat16(Float.float16ToFloat(Float.floatToFloat16((float)i)) * 2049.0f); >> short pre_rounding = Float16.float16ToRawShortBits(Float16.multiply(Float16.valueOf((float)i), Float16.valueOf((float)2049.0f))); >> if (pre_rounding != post_rounding) { >> System.out.println("Mismatch at val = " + (float)i); >> System.out.println("post_rounding val = " + post_rounding); >> System.out.println("pre_rounding val = " + pre_rounding); >> break; >> } >> } >> } >> >> public static void main(String [] args) { >> check(); >> } >> } >> >> >> CPROMPT>java --add-modules=jdk.incubator.vector -cp . verify_rounding >> WARNING: Using incubator modules: jdk.incubator.vector >> Mismatch at val = 3.0 >> post_rounding val = 28161 >> pre_rounding val = 28160 >> >> Since we intend to infer Float16 IR using patten match, hence it may be incorrect to transform post-rounding pattern to pre-rounding. > > Right ok. The wording `enforce a pattern match` still does not make sense to me. You can `perform` a `pattern match`, but what does it mean to `enforce` it? Can you rephrase please? Suggestion: Suggestion: // 1. conF must lie within Float16 value range, otherwise we would have rounding issues: // Doing the operation in float32 and then rounding is not the same as // rounding first and doing the operation in float16. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2140303597 From epeter at openjdk.org Wed Jun 11 14:18:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Jun 2025 14:18:39 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: <_IdYz769mq7-kTO802umUJX7Bmaz3Ds4GWLb75lAW8I=.0394a525-2288-407e-9201-7fb6b5f92353@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <_IdYz769mq7-kTO802umUJX7Bmaz3Ds4GWLb75lAW8I=.0394a525-2288-407e-9201-7fb6b5f92353@github.com> Message-ID: On Wed, 4 Jun 2025 10:47:57 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/convertnode.cpp line 294: >> >>> 292: // Conditions under which floating point constant can be considered for a pattern match. >>> 293: // 1. Constant must lie within Float16 value range, this will ensure that >>> 294: // we don't unintentially round off float constant to enforce a pattern match. >> >> What do you mean by `enforce a pattern match`? >> >> Are you just trying to say that we have to be careful with the pattern matching here, and we cannot just round off the float constant? Do you have an example where that rounding would lead to issues? > > import jdk.incubator.vector.*; > public class verify_rounding { > public static void check() { > for (int i = 0; i < 65550; i++) { > short post_rounding = Float.floatToFloat16(Float.float16ToFloat(Float.floatToFloat16((float)i)) * 2049.0f); > short pre_rounding = Float16.float16ToRawShortBits(Float16.multiply(Float16.valueOf((float)i), Float16.valueOf((float)2049.0f))); > if (pre_rounding != post_rounding) { > System.out.println("Mismatch at val = " + (float)i); > System.out.println("post_rounding val = " + post_rounding); > System.out.println("pre_rounding val = " + pre_rounding); > break; > } > } > } > > public static void main(String [] args) { > check(); > } > } > > > CPROMPT>java --add-modules=jdk.incubator.vector -cp . verify_rounding > WARNING: Using incubator modules: jdk.incubator.vector > Mismatch at val = 3.0 > post_rounding val = 28161 > pre_rounding val = 28160 > > Since we intend to infer Float16 IR using patten match, hence it may be incorrect to transform post-rounding pattern to pre-rounding. Right ok. The wording `enforce a pattern match` still does not make sense to me. You can `perform` a `pattern match`, but what does it mean to `enforce` it? Can you rephrase please? >> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 335: >> >>> 333: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() - INEXACT_FP16); >>> 334: res += Float.floatToFloat16(INEXACT_FP16 * POSITIVE_ZERO_VAR.floatValue()); >>> 335: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / INEXACT_FP16); >> >> Why is the mul case flipped here? > > To check for constant on either side of an expression. I see. It looks a little strange. You could leave a comment for the reader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2140288570 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2140306802 From rcastanedalo at openjdk.org Wed Jun 11 14:25:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Jun 2025 14:25:32 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:17:11 GMT, Roland Westrelin wrote: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Looks good otherwise! Another question: could `PhaseIdealLoop::try_move_store_before_loop` cause similar issues on strip-mined loops? src/hotspot/share/opto/loopnode.cpp line 3026: > 3024: // Do we already have a memory Phi for that slice on the outer loop? If that is the case, that Phi was created > 3025: // by cloning an inner loop Phi. The inner loop Phi should have mem, the memory state of the first Store out of > 3026: // the inner loop as input on the backedge. So does the outer loop Phi given it's a clone. Suggestion: // the inner loop, as input on the backedge. So does the outer loop Phi given it's a clone. src/hotspot/share/opto/loopnode.cpp line 3043: > 3041: igvn->replace_input_of(first, MemNode::Memory, phi); > 3042: } else { > 3043: // Fix memory state along the backedge: it should be the last sunk Stores of the chain Suggestion: // Fix memory state along the backedge: it should be the last sunk Store of the chain test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java line 1: > 1: /* It would be good, for completeness, to add a "Couple stores sunk in outer loop, store in inner loop" test. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2917408313 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2140301451 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2140302546 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2140308882 From dfenacci at openjdk.org Wed Jun 11 15:04:35 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 11 Jun 2025 15:04:35 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 10 Jun 2025 08:41:16 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - move the changes in flag constraints specific file I might be a bit picky (sorry for that) but since the flag was triggering a crash I was wondering if we could have a small regression test to make sure the VM never crashes (possibly checking the error as well). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2963184963 From mchevalier at openjdk.org Wed Jun 11 16:18:38 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 11 Jun 2025 16:18:38 GMT Subject: Withdrawn: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24966 From mchevalier at openjdk.org Wed Jun 11 16:18:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 11 Jun 2025 16:18:37 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Tue, 20 May 2025 03:26:49 GMT, Vladimir Ivanov wrote: >> A first part toward a better support of pure functions. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! >> >> IR framework and IGV needed a little bit of fixing. >> >> Thanks, >> Marc > > I'm just pointing out that delaying lowering decision till matching phase neither makes scheduling easier nor makes implementation simpler. > > For loop opts it is important to know when loops contain calls and act accordingly (by trying to hoist relevant nodes out of loops and disabling some optimizations when the calls are still there). > > The difference between CFG nodes effectively pinned AT some point and non-CFG nodes with control dependency (effectively pushing them UNDER their control input) becomes insignificant once CFG nodes depend solely on control. In other words, once a call node doesn't consume/produce memory and I/O states, it becomes straightforward to move it around in CFG when desired (between it's inputs and users). > > Speaking of scheduling, would default scheduling heuristics do a good job? The case of expensive nodes exemplifies the need of custom scheduling heuristics for such nodes. > > Implementation-wise, lowering during matching becomes platform-specific and requires each platform to introduce `effect(CALL)` AD instructions. Moreover, each call shape (determined by arity and argument kinds) has to be explicitly handled with a dedicated AD instruction. And it doesn't benefit from existing support of call nodes every platform already has. > > >> Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial. > > The current implementation of expensive nodes can definitely be improved, but the nice property it has is that it only decreases the number of nodes through careful commoning during loop opts. Once cloning is allowed, there's a new problem to care about: the case of too many clones. > > A simple incremental improvement would be to teach `PhaseIdealLoop::process_expensive_nodes()` to push expensive nodes closer to their users if they are on less frequent code paths. Then it can be taught (how and when) to clone expensive nodes between multiple users. After patient guidance from @iwanowww, I came to a new version whose implementation has very little to do with this one. I'll close it and open a fresh one. Nevertheless, thanks to everyone who looked at it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2963437324 From mchevalier at openjdk.org Wed Jun 11 16:26:58 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 11 Jun 2025 16:26:58 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls Message-ID: A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. ## Pure Functions Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. ## Scope We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. ## Implementation Overview We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. Thanks, Marc ------------- Commit messages: - Eliminate pure function calls Changes: https://git.openjdk.org/jdk/pull/25760/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25760&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347901 Stats: 196 lines in 15 files changed: 132 ins; 33 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/25760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25760/head:pull/25760 PR: https://git.openjdk.org/jdk/pull/25760 From mdoerr at openjdk.org Wed Jun 11 16:52:30 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Jun 2025 16:52:30 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 16:18:41 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. > > This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. > > Thanks, > Marc Just a side comment: "Integer division is not pure as dividing by zero is throwing." is only true for some platforms. See [JDK-8299857](https://bugs.openjdk.org/browse/JDK-8299857). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25760#issuecomment-2963526781 From mchevalier at openjdk.org Wed Jun 11 16:57:30 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 11 Jun 2025 16:57:30 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 16:18:41 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. > > This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. > > Thanks, > Marc Right. Yet, it's safe to consider it as non-pure, as an over-approximation, but it could be refined. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25760#issuecomment-2963538364 From iklam at openjdk.org Wed Jun 11 18:32:41 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 11 Jun 2025 18:32:41 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded Message-ID: When running with $ make test-only JTREG=AOT_JDK=twostep \ TEST_VM_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT' \ TEST=open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestBlsmskL.java One of the the -Xint or -Xcomp VM launched by these tests will not be able to load the AOT cache (due to [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) -- at this point, we are not sure if this behavior is intended or not). Regardless on the decision about [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) , we should make the bmi tests more resilient by filtering out AOT error logs, since the purpose of thethe bmi tests are for testing intrinsics, not AOT. ------------- Commit messages: - 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded Changes: https://git.openjdk.org/jdk/pull/25761/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25761&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344556 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25761.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25761/head:pull/25761 PR: https://git.openjdk.org/jdk/pull/25761 From wkemper at openjdk.org Wed Jun 11 18:38:29 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Jun 2025 18:38:29 GMT Subject: RFR: 8358334: C2/Shenandoah: incorrect execution with Unsafe In-Reply-To: References: Message-ID: <_vTHjSU4jSzcb7EgFzGOk3Tp_uRff502Tfb_JViarl0=.08d55130-b250-4476-8805-8c114f940248@github.com> On Tue, 10 Jun 2025 14:13:21 GMT, Roland Westrelin wrote: > When a barrier is expanded, some control is picked as a location for > the barrier. The control input of data nodes that depend on that > control are updated so the nodes are after the expanded barrier unless > the barrier itself depends on some of those nodes. > > In this particular failure, a raw memoy `Store` is the input memory to > the barrier. That `Store` has an anti-dependent `Load`. All 3 nodes > (barrier, `Load` and `Store`) are at the same control. The `Store` is > an input to the barrier so it stays before the barrier. The `Load`'s > control is updated to be after the barrier which breaks the > anti-dependency. The bug is that the logic that sorts nodes that need > to be before the barrier and those that can be after ignores > anti-dependencies. The fix simply extends that logic to take them into > account. Thank you for this! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25729#pullrequestreview-2918306489 From cslucas at openjdk.org Wed Jun 11 19:04:29 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Jun 2025 19:04:29 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 07:50:43 GMT, Doug Simon wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: add comments, refactor enum definition in the Java side. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/InstalledCode.java line 152: > >> 150: */ >> 151: public void invalidate() { >> 152: invalidate(true, 0); > > This assigns `ChangeReason::C1_codepatch` to JVMCI invalidations which does not seem right. I believe zero is mapped to `UNKNOWN` in the ChangeReason enum, if I'm not very mistaken here?! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2140876714 From shade at openjdk.org Wed Jun 11 19:09:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 11 Jun 2025 19:09:28 GMT Subject: RFR: 8358334: C2/Shenandoah: incorrect execution with Unsafe In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 14:13:21 GMT, Roland Westrelin wrote: > When a barrier is expanded, some control is picked as a location for > the barrier. The control input of data nodes that depend on that > control are updated so the nodes are after the expanded barrier unless > the barrier itself depends on some of those nodes. > > In this particular failure, a raw memoy `Store` is the input memory to > the barrier. That `Store` has an anti-dependent `Load`. All 3 nodes > (barrier, `Load` and `Store`) are at the same control. The `Store` is > an input to the barrier so it stays before the barrier. The `Load`'s > control is updated to be after the barrier which breaks the > anti-dependency. The bug is that the logic that sorts nodes that need > to be before the barrier and those that can be after ignores > anti-dependencies. The fix simply extends that logic to take them into > account. The patch makes sense. I ran `make test TEST=all TEST_VM_OPTS=-XX:+UseShenandoahGC` without new failures. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25729#pullrequestreview-2918386137 From dnsimon at openjdk.org Wed Jun 11 19:17:32 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Jun 2025 19:17:32 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 19:01:22 GMT, Cesar Soares Lucas wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/InstalledCode.java line 152: >> >>> 150: */ >>> 151: public void invalidate() { >>> 152: invalidate(true, 0); >> >> This assigns `ChangeReason::C1_codepatch` to JVMCI invalidations which does not seem right. > > I believe zero is mapped to `UNKNOWN` in the ChangeReason enum, if I'm not very mistaken here?! Sorry, I was looking at nmethod.hpp in my local source without the changes in this PR. That said, it should probably map to `ChangeReason::JVMCI_invalidate_nmethod` right? BTW, seems like `ChangeReason::JVMCI_invalidate_nmethod_mirror` is unused and could be deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2140896474 From dnsimon at openjdk.org Wed Jun 11 19:20:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Jun 2025 19:20:30 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 18:28:35 GMT, Ioi Lam wrote: > When running with > > > $ make test-only JTREG=AOT_JDK=twostep \ > TEST_VM_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT' \ > TEST=open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestBlsmskL.java > > > One of the the -Xint or -Xcomp VM launched by these tests will not be able to load the AOT cache (due to [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) -- at this point, we are not sure if this behavior is intended or not). > > Regardless on the decision about [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) , we should make the bmi tests more resilient by filtering out AOT error logs, since the purpose of thethe bmi tests are for testing intrinsics, not AOT. LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25761#pullrequestreview-2918413542 From kvn at openjdk.org Wed Jun 11 19:24:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Jun 2025 19:24:32 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 18:28:35 GMT, Ioi Lam wrote: > When running with > > > $ make test-only JTREG=AOT_JDK=twostep \ > TEST_VM_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT' \ > TEST=open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestBlsmskL.java > > > One of the the -Xint or -Xcomp VM launched by these tests will not be able to load the AOT cache (due to [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) -- at this point, we are not sure if this behavior is intended or not). > > Regardless on the decision about [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) , we should make the bmi tests more resilient by filtering out AOT error logs, since the purpose of thethe bmi tests are for testing intrinsics, not AOT. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25761#pullrequestreview-2918423050 From cslucas at openjdk.org Wed Jun 11 19:30:28 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Jun 2025 19:30:28 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 08:15:52 GMT, Doug Simon wrote: > While the new ChangeReason JVMCI enum looks nice, I don't quite get how/where it is (or should be) used? It seems disconnected from the `InstalledCode.changeReason` field. I agree, I also have that feeling because the type of the field is not the enum. I'll refactor that. The end goal here mainly (but not only) is to be able to use this "change or invalidation reason" to reset Truffle CallTarget profiles when its installed code was evicted from the code cache because it was "cold". See this [draft PR](https://github.com/JohnTortugo/graal/pull/2/files#diff-1f5e4cd4034f7f7571d9a021272192501800c9b19a41101bc26aaaeebaf14e15) that I'm preparing for Truffle - it contains some spurious changes right now. > In general, I don't find the name ChangeReason quite models the concept properly - wouldn't "InvalidationReason" be more accurate? "change" is a very broad concept. I agree that "change" is broad, but is it a bad thing in this context? Shouldn't both sides of JVMCI (if they want to) be able to "monitor" any change that the other side did to an installed code? "invalidationReason" may not be great as well because the field will be also set when the code is installed. > I think all JVMCI Java level tracking of change (or invalidation) reasons should be confined to HotSpotNmethod as it doesn't make much sense in the other InstalledCode (sub)classes. IMHO it seemed right to add the field in the "InstalledCode" class because having a simple way to communicate back-and-forth that a change occurred, and why it occurred, felt like a good thing to have overall. I understand that not all users of JVMCI may use it. If you feel strongly about this I can move the field to HotSpotNmethod. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2963924270 From dnsimon at openjdk.org Wed Jun 11 20:03:28 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Jun 2025 20:03:28 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 19:27:38 GMT, Cesar Soares Lucas wrote: > I agree that "change" is broad, but is it a bad thing in this context? Shouldn't both sides of JVMCI (if they want to) be able to "monitor" any change that the other side did to an installed code? "invalidationReason" may not be great as well because the field will be also set when the code is installed. The only change we're talking about is invalidation via `nmethod::make_not_entrant` right? If so, then it really is *only* an invalidation reason. Why will the field be set when the code is installed? Do you mean it will be initialized to the default `int` value of 0? Then it should be initialized to `-1` to denote that no invalidation has occurred. All the current reasons are HotSpot specific and the very notion of "change" is exactly tied to `nmethod::make_not_entrant` so I still think this should all be confined to HotSpotNmethod until we have a good use case for lifting it up to InstalledCode. We've learnt through painful experience that making API too broad, too early almost always comes back to bite us. I would also drop the enum on the Java side. It really doesn't add any extra value as shown by `installedCode.getChangeReason() == HotSpotNmethod.ChangeReason.GC_UNLINKING_COLD.ordinal()` [here](https://github.com/JohnTortugo/graal/pull/2/files#diff-1f5e4cd4034f7f7571d9a021272192501800c9b19a41101bc26aaaeebaf14e15R189). That is, if you're using the ordinal of an enum, then the enum is probably just an unnecessary box for an `int`. While you could make the `HotSpotNmethod.invalidationReason` field itself be an enum, that makes setting the field in `JVMCINMethodData::invalidate_nmethod_mirror` significantly more complicated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2963996747 From dlong at openjdk.org Wed Jun 11 21:26:27 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 11 Jun 2025 21:26:27 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 13:33:13 GMT, Manuel H?ssig wrote: > # Issue Summary > > When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: > > class A { > static class B { > static String field; > static void test() { > String tmp = field; > new C(field); > } > } > > static class C { > static { > B.field = "Hello"; > } > > C(String val) { > if (val == null) { > throw new RuntimeException("Should not reach here"); > } > } > } > } > > Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. > > # Changes > > To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. > > # Benchmark Results > > Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) > - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes > > # Acknowledgements > > Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. src/hotspot/share/c1/c1_ValueMap.hpp line 193: > 191: ciInstanceKlass* c = x->klass(); > 192: if (c != nullptr && !c->is_initialized() && > 193: ((c->is_loaded() && c->has_class_initializer()) || !c->is_loaded())) { Suggestion: (!c->is_loaded() || c->has_class_initializer()) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25725#discussion_r2141142877 From snatarajan at openjdk.org Wed Jun 11 21:49:38 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 11 Jun 2025 21:49:38 GMT Subject: RFR: 8342941: IGV: Add new graph dumps for post loop, empty loop removal, and one iteration removal Message-ID: This changeset adds BEFORE/AFTER graph dumps for creating a post loop (`insert_post_loop()`), removing an empty loop (`do_remove_empty_loop()`), and removing a one iteration loop (`do_one_iteration_loop()`). Changes: - Added `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` for dumping graphs before and after `insert_post_loop()`. - Added `BEFORE_REMOVE_EMPTY_LOOP` and `AFTER_REMOVE_EMPTY_LOOP` for dumping graphs before and after `do_remove_empty_loop()`. - Added `BEFORE_ONE_ITERATION_LOOP` and `AFTER_ONE_ITERATION_LOOP` for dumping graphs before and after `do_one_iteration_loop()`. Below are sample screenshots (IGV print level 4 ) mainly showing the new phase . 1. `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` ![image](https://github.com/user-attachments/assets/1661cede-5d70-4e0d-abec-3d091c7675c8) 2. `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` with SuperWordLoopUnrollAnalysis enabled ![image](https://github.com/user-attachments/assets/6a22e6f0-4e6c-4e9d-8b6b-2bf75fac783d) 3.` BEFORE_REMOVE_EMPTY_LOOP `and `AFTER_REMOVE_EMPTY_LOOP` ![image](https://github.com/user-attachments/assets/3281f00b-575e-4604-83dd-831037d8dd47) 4. `BEFORE_ONE_ITERATION_LOOP` and `AFTER_ONE_ITERATION_LOOP` ![image](https://github.com/user-attachments/assets/efddbc9a-64f7-403d-acfe-330d75a00911) Question to reviewers: Are the new compiler phases OK, or should we change anything? Testing: GitHub Actions tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) ------------- Commit messages: - Initial Fix Changes: https://git.openjdk.org/jdk/pull/25756/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25756&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342941 Stats: 19 lines in 3 files changed: 19 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25756.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25756/head:pull/25756 PR: https://git.openjdk.org/jdk/pull/25756 From cslucas at openjdk.org Wed Jun 11 22:46:27 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Jun 2025 22:46:27 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v2] In-Reply-To: References: Message-ID: <5Zja0OdYDsycdWBaGsSt6oJWf7ByTRukZ84WaqX3dRs=.32ba25ba-acc1-46b3-b047-9d08f2fc137d@github.com> On Wed, 11 Jun 2025 00:05:47 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: add comments, refactor enum definition in the Java side. > The only change we're talking about is invalidation via `nmethod::make_not_entrant` right? If so, then it really is _only_ an invalidation reason. > Why will the field be set when the code is installed? Do you mean it will be initialized to the default int value of 0? Then it should be initialized to -1 to denote that no invalidation has occurred. I was referring to the change made in `jvmciEnv.cpp` in this PR. > All the current reasons are HotSpot specific and the very notion of "change" is exactly tied to nmethod::make_not_entrant so I still think this should all be confined to HotSpotNmethod until we have a good use case for lifting it up to InstalledCode. We've learnt through painful experience that making API too broad, too early almost always comes back to bite us. I would also drop the enum on the Java side. It really doesn't add any extra value as shown by installedCode.getChangeReason() == HotSpotNmethod.ChangeReason.GC_UNLINKING_COLD.ordinal() [here](https://github.com/JohnTortugo/graal/pull/2/files#diff-1f5e4cd4034f7f7571d9a021272192501800c9b19a41101bc26aaaeebaf14e15R189). That is, if you're using the ordinal of an enum, then the enum is probably just an unnecessary box for an int. While you could make the HotSpotNmethod.invalidationReason field itself be an enum, that makes setting the field in JVMCINMethodData::invalidate_nmethod_mirror significantly more complicated. Sounds good to me. I'll rename the field to `invalidatedReason`, move it to `HotSpotNmethod` and remove the enum from Java side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2964471872 From cslucas at openjdk.org Thu Jun 12 00:27:55 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 12 Jun 2025 00:27:55 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v3] In-Reply-To: References: Message-ID: <2ATJAQCD4VUVZ02RO2czrh4D40N0y4TR7XDRAhQ-PWE=.1bc8747d-0e2d-435c-a647-a8c718f6fcdc@github.com> > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Refactoring: move changes to HotSpotNmethod class; remove enum on Java side; rename field to invalidationReason ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/5e4b8145..fcf838bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=01-02 Stats: 138 lines in 15 files changed: 29 ins; 75 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From iklam at openjdk.org Thu Jun 12 00:44:30 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 12 Jun 2025 00:44:30 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 19:17:54 GMT, Doug Simon wrote: >> When running with >> >> >> $ make test-only JTREG=AOT_JDK=twostep \ >> TEST_VM_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT' \ >> TEST=open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestBlsmskL.java >> >> >> One of the the -Xint or -Xcomp VM launched by these tests will not be able to load the AOT cache (due to [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) -- at this point, we are not sure if this behavior is intended or not). >> >> Regardless on the decision about [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) , we should make the bmi tests more resilient by filtering out AOT error logs, since the purpose of thethe bmi tests are for testing intrinsics, not AOT. > > LGTM. Thanks @dougxc @vnkozlov for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/25761#issuecomment-2964638511 From iklam at openjdk.org Thu Jun 12 00:44:31 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 12 Jun 2025 00:44:31 GMT Subject: Integrated: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 18:28:35 GMT, Ioi Lam wrote: > When running with > > > $ make test-only JTREG=AOT_JDK=twostep \ > TEST_VM_OPTS='-XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT' \ > TEST=open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestBlsmskL.java > > > One of the the -Xint or -Xcomp VM launched by these tests will not be able to load the AOT cache (due to [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) -- at this point, we are not sure if this behavior is intended or not). > > Regardless on the decision about [JDK-8358738](https://bugs.openjdk.org/browse/JDK-8358738) , we should make the bmi tests more resilient by filtering out AOT error logs, since the purpose of thethe bmi tests are for testing intrinsics, not AOT. This pull request has now been integrated. Changeset: 3b32f6a8 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/3b32f6a8ec37338764d3e6713247ff96e49bf5b3 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25761 From cslucas at openjdk.org Thu Jun 12 01:05:13 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 12 Jun 2025 01:05:13 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v4] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Some clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/fcf838bd..1f3c2598 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=02-03 Stats: 13 lines in 5 files changed: 6 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From fyang at openjdk.org Thu Jun 12 02:10:11 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 12 Jun 2025 02:10:11 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call Message-ID: Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo spring) on linux-riscv64 platforms where misaligned memory access is very slow like Sifive Unmatched or Premier P550 SBCs. Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. Testing: - [x] Tier1-3 on linux-aarch64 (release & fastdebug) - [x] Tier1-3 on linux-riscv64 (release) - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 ------------- Commit messages: - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call Changes: https://git.openjdk.org/jdk/pull/25765/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359270 Stats: 18 lines in 2 files changed: 10 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25765.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25765/head:pull/25765 PR: https://git.openjdk.org/jdk/pull/25765 From amitkumar at openjdk.org Thu Jun 12 04:54:48 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Jun 2025 04:54:48 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v3] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: add test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/539b3fe4..d7aef4a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=01-02 Stats: 59 lines in 1 file changed: 59 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From amitkumar at openjdk.org Thu Jun 12 05:10:46 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Jun 2025 05:10:46 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v4] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/d7aef4a9..f8fbb4df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From chagedorn at openjdk.org Thu Jun 12 06:20:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 06:20:27 GMT Subject: RFR: 8342941: IGV: Add new graph dumps for post loop, empty loop removal, and one iteration removal In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 13:54:20 GMT, Saranya Natarajan wrote: > This changeset adds BEFORE/AFTER graph dumps for creating a post loop (`insert_post_loop()`), removing an empty loop (`do_remove_empty_loop()`), and removing a one iteration loop (`do_one_iteration_loop()`). > > Changes: > - Added `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` for dumping graphs before and after `insert_post_loop()`. > - Added `BEFORE_REMOVE_EMPTY_LOOP` and `AFTER_REMOVE_EMPTY_LOOP` for dumping graphs before and after `do_remove_empty_loop()`. > - Added `BEFORE_ONE_ITERATION_LOOP` and `AFTER_ONE_ITERATION_LOOP` for dumping graphs before and after `do_one_iteration_loop()`. > > Below are sample screenshots (IGV print level 4 ) mainly showing the new phase . > 1. `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` > ![image](https://github.com/user-attachments/assets/1661cede-5d70-4e0d-abec-3d091c7675c8) > 2. `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` with SuperWordLoopUnrollAnalysis enabled > ![image](https://github.com/user-attachments/assets/6a22e6f0-4e6c-4e9d-8b6b-2bf75fac783d) > 3.` BEFORE_REMOVE_EMPTY_LOOP `and `AFTER_REMOVE_EMPTY_LOOP` > ![image](https://github.com/user-attachments/assets/3281f00b-575e-4604-83dd-831037d8dd47) > 4. `BEFORE_ONE_ITERATION_LOOP` and `AFTER_ONE_ITERATION_LOOP` > ![image](https://github.com/user-attachments/assets/efddbc9a-64f7-403d-acfe-330d75a00911) > > Question to reviewers: > Are the new compiler phases OK, or should we change anything? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Thanks for adding those! You are currently dumping the `CountedLoop` for the "before" and "after" dump. I think we could improve the "after" dump to show the actual change: - In the "after" dump of the post loop, we could dump the new post loop instead of the old one. - For the empty loop removal we could dump the `final_iv` instead: https://github.com/openjdk/jdk/blob/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e/src/hotspot/share/opto/loopTransform.cpp#L3172 - For the one iteration removal, I'm not sure if it's worth to print anything. I think we can either print nothing or the init which replaces the iv: https://github.com/openjdk/jdk/blob/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e/src/hotspot/share/opto/loopTransform.cpp#L3334 ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25756#pullrequestreview-2919673432 From thartmann at openjdk.org Thu Jun 12 06:34:28 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 06:34:28 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 01:54:34 GMT, Fei Yang wrote: > Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. > > There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. > > And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. > > Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. > > Testing: > - [x] Tier1-3 on linux-aarch64 (release & fastdebug) > - [x] Tier1-3 on linux-riscv64 (release) > - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 Would it be possible to add an IR Framework test for this, checking that the right stub is selected? ------------- PR Review: https://git.openjdk.org/jdk/pull/25765#pullrequestreview-2919701657 From thartmann at openjdk.org Thu Jun 12 06:49:07 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 06:49:07 GMT Subject: [jdk25] RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX Message-ID: Hi all, This pull request contains a backport of commit [c98dffa1](https://github.com/openjdk/jdk/commit/c98dffa186d48c41e76fd3a60e0129a8da60310f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Jatin Bhateja on 11 Jun 2025 and was reviewed by Emanuel Peter and Sandhya Viswanathan. Thanks! ------------- Commit messages: - Backport c98dffa186d48c41e76fd3a60e0129a8da60310f Changes: https://git.openjdk.org/jdk/pull/25769/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25769&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357982 Stats: 122 lines in 9 files changed: 108 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25769/head:pull/25769 PR: https://git.openjdk.org/jdk/pull/25769 From dfenacci at openjdk.org Thu Jun 12 06:50:29 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 12 Jun 2025 06:50:29 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: <1zXTCZhIIfTtKzQ4UxJopvKFE48wo4nQDZ8AtjqNNyw=.14c6a253-411f-4850-aa1c-1de543f5eb1e@github.com> On Tue, 10 Jun 2025 13:33:13 GMT, Manuel H?ssig wrote: > # Issue Summary > > When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: > > class A { > static class B { > static String field; > static void test() { > String tmp = field; > new C(field); > } > } > > static class C { > static { > B.field = "Hello"; > } > > C(String val) { > if (val == null) { > throw new RuntimeException("Should not reach here"); > } > } > } > } > > Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. > > # Changes > > To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. > > # Benchmark Results > > Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) > - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes > > # Acknowledgements > > Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. Thanks for fixing this @mhaessig! Looks good to me! src/hotspot/share/ci/ciInstanceKlass.cpp line 553: > 551: > 552: bool ciInstanceKlass::has_class_initializer() { > 553: VM_ENTRY_MARK; Out of curiosity: do we strictly need to add `VM_ENTRY_MARK` here (since it is only called from the compiler) or it is mostly to ensure it is VM-safe if called from "outside'? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/25725#pullrequestreview-2919733522 PR Review Comment: https://git.openjdk.org/jdk/pull/25725#discussion_r2141842986 From chagedorn at openjdk.org Thu Jun 12 06:53:39 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 06:53:39 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v7] In-Reply-To: References: Message-ID: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> On Tue, 10 Jun 2025 14:44:48 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > reorder flags for Christian Some more minor comments but otherwise looks good, thanks for the updates and for the offline discussion to share some background! > @chhagedorn I checked with @TobiHartmann : he said he does not have a strong opinion, but if he had to make a decision, he would prefers having everything in the comments. Then let's go with everything in the comments - might be better in the end since we eventually want to replace the individual comments with final verdicts or have the cases fixed anyway at some point :-) src/hotspot/share/opto/phaseX.cpp line 1204: > 1202: switch (n->Opcode()) { > 1203: // RangeCheckNode::Ideal looks up the chain for about 999 nodes > 1204: // see "Range-Check scan limit". So it is possible that something Suggestion: // (see "Range-Check scan limit"). So, it is possible that something src/hotspot/share/opto/phaseX.cpp line 1205: > 1203: // RangeCheckNode::Ideal looks up the chain for about 999 nodes > 1204: // see "Range-Check scan limit". So it is possible that something > 1205: // optimized in that input subgraph, and the RangeCheck was not Suggestion: // is optimized in that input subgraph, and the RangeCheck was not src/hotspot/share/opto/phaseX.cpp line 1256: > 1254: // "Useless" means that there is no code in either branch of the If. > 1255: // I found a case where this was not done yet during IGVN. > 1256: // Why does the Region not get added to IGVN worklist when the If diamond becomes useles? Suggestion: // Why does the Region not get added to IGVN worklist when the If diamond becomes useless? src/hotspot/share/opto/phaseX.cpp line 1316: > 1314: case Op_AddD: > 1315: //case Op_AddI: // Also affected for other reasons. > 1316: //case Op_AddL: // Also affected for other reasons. Suggestion: //case Op_AddI: // Also affected for other reasons, see case further down. //case Op_AddL: // Also affected for other reasons, see case further down. src/hotspot/share/opto/phaseX.cpp line 1432: > 1430: // x + (0 - [8424 AddL]) > 1431: // but the AddL was not added to the IGVN worklist. Investigate why. > 1432: // There could be other issues too. For example with "commute", see above. Suggestion: // There could be other issues, too. For example with "commute", see above. src/hotspot/share/opto/phaseX.cpp line 1444: > 1442: // This has the effect that these new nodes end up on the IGVN worklist, > 1443: // but if we now leave verification and IGVN itself, we have nodes on the > 1444: // worklist, and that should not be (there are asserts against this). Sounds like we just need some exception when calling `Ideal` on a `SubTypeCheck` that we can have certain nodes still on the worklist like a `cmp`. Maybe we should add this as a suggestion to the comment? Suggestion: // but if we now leave verification and IGVN itself, we have nodes other // than 'n' still on the worklist. This will fail with an assert in // verify_empty_worklist(). Maybe we just need to add an exception and // check that only certain nodes like 'cmp' are still on the worklist. After // this check, we can clear the worklist such that verify_empty_worklist() // succeeds. src/hotspot/share/opto/phaseX.cpp line 1685: > 1683: // Found in tier1-3. > 1684: case Op_CMoveI: > 1685: return false; Maybe merge them together and add a comment that you have not investigated further (I assume?) since you found them all in tier1-3 without more specific details. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22970#pullrequestreview-2913983773 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2138142398 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2138143758 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2138146868 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2138150068 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2141804882 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2141810963 PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2141830386 From thartmann at openjdk.org Thu Jun 12 06:57:30 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 06:57:30 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: <1zXTCZhIIfTtKzQ4UxJopvKFE48wo4nQDZ8AtjqNNyw=.14c6a253-411f-4850-aa1c-1de543f5eb1e@github.com> References: <1zXTCZhIIfTtKzQ4UxJopvKFE48wo4nQDZ8AtjqNNyw=.14c6a253-411f-4850-aa1c-1de543f5eb1e@github.com> Message-ID: On Thu, 12 Jun 2025 06:45:23 GMT, Damon Fenacci wrote: >> # Issue Summary >> >> When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: >> >> class A { >> static class B { >> static String field; >> static void test() { >> String tmp = field; >> new C(field); >> } >> } >> >> static class C { >> static { >> B.field = "Hello"; >> } >> >> C(String val) { >> if (val == null) { >> throw new RuntimeException("Should not reach here"); >> } >> } >> } >> } >> >> Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. >> >> # Changes >> >> To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. >> >> # Benchmark Results >> >> Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. >> >> # Testing >> >> - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) >> - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes >> >> # Acknowledgements >> >> Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. > > src/hotspot/share/ci/ciInstanceKlass.cpp line 553: > >> 551: >> 552: bool ciInstanceKlass::has_class_initializer() { >> 553: VM_ENTRY_MARK; > > Out of curiosity: do we strictly need to add `VM_ENTRY_MARK` here (since it is only called from the compiler) or it is mostly to ensure it is VM-safe if called from "outside'? But it's needed exactly because we are calling from compiler, right? It will bring the compiler thread into the VM state which is required for accessing instance klass data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25725#discussion_r2141859139 From thartmann at openjdk.org Thu Jun 12 07:01:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 07:01:29 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 13:33:13 GMT, Manuel H?ssig wrote: > # Issue Summary > > When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: > > class A { > static class B { > static String field; > static void test() { > String tmp = field; > new C(field); > } > } > > static class C { > static { > B.field = "Hello"; > } > > C(String val) { > if (val == null) { > throw new RuntimeException("Should not reach here"); > } > } > } > } > > Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. > > # Changes > > To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. > > # Benchmark Results > > Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) > - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes > > # Acknowledgements > > Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25725#pullrequestreview-2919772328 From dfenacci at openjdk.org Thu Jun 12 07:08:33 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 12 Jun 2025 07:08:33 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: <1zXTCZhIIfTtKzQ4UxJopvKFE48wo4nQDZ8AtjqNNyw=.14c6a253-411f-4850-aa1c-1de543f5eb1e@github.com> Message-ID: On Thu, 12 Jun 2025 06:54:32 GMT, Tobias Hartmann wrote: >> src/hotspot/share/ci/ciInstanceKlass.cpp line 553: >> >>> 551: >>> 552: bool ciInstanceKlass::has_class_initializer() { >>> 553: VM_ENTRY_MARK; >> >> Out of curiosity: do we strictly need to add `VM_ENTRY_MARK` here (since it is only called from the compiler) or it is mostly to ensure it is VM-safe if called from "outside'? > > But it's needed exactly because we are calling from compiler, right? It will bring the compiler thread into the VM state which is required for accessing instance klass data. Right! Thanks! I think I was seeing it the other way around. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25725#discussion_r2141879159 From epeter at openjdk.org Thu Jun 12 07:14:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 07:14:37 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v7] In-Reply-To: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> References: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> Message-ID: On Thu, 12 Jun 2025 06:38:14 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> reorder flags for Christian > > src/hotspot/share/opto/phaseX.cpp line 1685: > >> 1683: // Found in tier1-3. >> 1684: case Op_CMoveI: >> 1685: return false; > > Maybe merge them together and add a comment that you have not investigated further (I assume?) since you found them all in tier1-3 without more specific details. I would prefer to keep them separate, so it is easier to remove them individually without getting merge conflicts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2141889139 From mhaessig at openjdk.org Thu Jun 12 07:24:44 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 07:24:44 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce Message-ID: # Issue Summary Running java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. # Changes This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. # Testing - [ ] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) ------------- Commit messages: - Respect NonNMethodCodeHeapSize during ergonomic compiler count selection - Correctly calculate the compiler buffer size Changes: https://git.openjdk.org/jdk/pull/25770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25770&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354727 Stats: 12 lines in 1 file changed: 8 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25770/head:pull/25770 PR: https://git.openjdk.org/jdk/pull/25770 From mhaessig at openjdk.org Thu Jun 12 07:33:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 07:33:14 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v2] In-Reply-To: References: Message-ID: > # Issue Summary > > When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: > > class A { > static class B { > static String field; > static void test() { > String tmp = field; > new C(field); > } > } > > static class C { > static { > B.field = "Hello"; > } > > C(String val) { > if (val == null) { > throw new RuntimeException("Should not reach here"); > } > } > } > } > > Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. > > # Changes > > To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. > > # Benchmark Results > > Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) > - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes > > # Acknowledgements > > Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Logical simplification Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25725/files - new: https://git.openjdk.org/jdk/pull/25725/files/ccdbdbdc..f6a4a290 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25725&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25725&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25725/head:pull/25725 PR: https://git.openjdk.org/jdk/pull/25725 From mhaessig at openjdk.org Thu Jun 12 07:33:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 07:33:15 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 21:24:18 GMT, Dean Long wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Logical simplification >> >> Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> > > src/hotspot/share/c1/c1_ValueMap.hpp line 193: > >> 191: ciInstanceKlass* c = x->klass(); >> 192: if (c != nullptr && !c->is_initialized() && >> 193: ((c->is_loaded() && c->has_class_initializer()) || !c->is_loaded())) { > > Suggestion: > > (!c->is_loaded() || c->has_class_initializer()) { Thank you for this simplification. I forgot about short-circuiting in this case... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25725#discussion_r2141930308 From fyang at openjdk.org Thu Jun 12 07:35:29 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 12 Jun 2025 07:35:29 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v6] In-Reply-To: <1oKHolQiymAnc5OKC1RcR9fDlL9c_F0zW6gHJ3pQKWI=.6e5f69a6-f474-415d-8398-e5e2952d985e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> <1oKHolQiymAnc5OKC1RcR9fDlL9c_F0zW6gHJ3pQKWI=.6e5f69a6-f474-415d-8398-e5e2952d985e@github.com> Message-ID: On Wed, 11 Jun 2025 12:50:14 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust arguments orders src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1385: > 1383: if (is_single) { > 1384: // jump if cmp1 < cmp2 or either is NaN > 1385: // not jump (i.e. move src to dst) if cmp1 >= cmp2 Or simply: `// fallthrough (i.e. move src to dst) if cmp1 >= cmp2`? Similar for other friends. test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java line 70: > 68: // Booltest::ge > 69: TestFramework framework = new TestFramework(Test_ge_1.class); > 70: framework.addFlags("-XX:-TieredCompilation", "-XX:+UseZicond", "-Xlog:jit+compilation=trace").start(); I see `-XX:+UseZicond` and `-XX:-UseZicond` options are used in this test. What if the testing platform doesn't have the Zicond extension? Maybe we can simply remove use of these options as option `UseZicond` will be auto-enabled for fastdebug builds for test coverage purpose if this extension is available. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25696#discussion_r2141821676 PR Review Comment: https://git.openjdk.org/jdk/pull/25696#discussion_r2141819338 From epeter at openjdk.org Thu Jun 12 07:48:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 07:48:12 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v8] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/ffc54f6e..abfd3a27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=06-07 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Thu Jun 12 07:48:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 07:48:12 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v7] In-Reply-To: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> References: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> Message-ID: On Thu, 12 Jun 2025 06:24:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> reorder flags for Christian > > src/hotspot/share/opto/phaseX.cpp line 1444: > >> 1442: // This has the effect that these new nodes end up on the IGVN worklist, >> 1443: // but if we now leave verification and IGVN itself, we have nodes on the >> 1444: // worklist, and that should not be (there are asserts against this). > > Sounds like we just need some exception when calling `Ideal` on a `SubTypeCheck` that we can have certain nodes still on the worklist like a `cmp`. Maybe we should add this as a suggestion to the comment? > Suggestion: > > // but if we now leave verification and IGVN itself, we have nodes other > // than 'n' still on the worklist. This will fail with an assert in > // verify_empty_worklist(). Maybe we just need to add an exception and > // check that only certain nodes like 'cmp' are still on the worklist. After > // this check, we can clear the worklist such that verify_empty_worklist() > // succeeds. Thanks for the offline discussion, I'll write something new we discussed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2141953002 From epeter at openjdk.org Thu Jun 12 07:48:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 07:48:12 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v7] In-Reply-To: References: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> Message-ID: On Thu, 12 Jun 2025 07:11:42 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/phaseX.cpp line 1685: >> >>> 1683: // Found in tier1-3. >>> 1684: case Op_CMoveI: >>> 1685: return false; >> >> Maybe merge them together and add a comment that you have not investigated further (I assume?) since you found them all in tier1-3 without more specific details. > > I would prefer to keep them separate, so it is easier to remove them individually without getting merge conflicts. I'll add a comment that I did not investigate further yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2141954735 From epeter at openjdk.org Thu Jun 12 07:54:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 07:54:02 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v9] In-Reply-To: References: Message-ID: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: update comments for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/abfd3a27..f54d851a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=07-08 Stats: 19 lines in 1 file changed: 13 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Thu Jun 12 07:54:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 07:54:02 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v7] In-Reply-To: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> References: <-CuBx8_SYEJ3zHZt4mmvofa65QOBAdSNexfmDXXBFvI=.c767b77a-0d4e-4fe1-9dc9-e796ee9a135b@github.com> Message-ID: On Thu, 12 Jun 2025 06:50:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> reorder flags for Christian > > Some more minor comments but otherwise looks good, thanks for the updates and for the offline discussion to share some background! > >> @chhagedorn I checked with @TobiHartmann : he said he does not have a strong opinion, but if he had to make a decision, he would prefers having everything in the comments. > > Then let's go with everything in the comments - might be better in the end since we eventually want to replace the individual comments with final verdicts or have the cases fixed anyway at some point :-) @chhagedorn Thanks for reviewing and the suggestions! I addressed them all :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2965521510 From chagedorn at openjdk.org Thu Jun 12 08:08:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 08:08:30 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v3] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 12:01:17 GMT, Emanuel Peter wrote: >> We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. >> >> This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). >> >> https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions Otherwise, looks good! test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java line 95: > 93: // --- CLASS_HOOK insertions start --- > 94: """, > 95: Hooks.CLASS_HOOK.anchor( Ah, so I can insert here from my `testTemplateTokens` right? Do you also want to show an example in `TestWithTestFrameworkClass` that this is possible? Maybe we can also add something in the description above since I only got aware of that now when seeing this `anchor()`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25643#pullrequestreview-2919828653 PR Review Comment: https://git.openjdk.org/jdk/pull/25643#discussion_r2141902484 From chagedorn at openjdk.org Thu Jun 12 08:11:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 08:11:32 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v9] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 07:54:02 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > update comments for Christian Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22970#pullrequestreview-2920002022 From mhaessig at openjdk.org Thu Jun 12 08:35:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 08:35:54 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v3] In-Reply-To: References: Message-ID: > # Issue Summary > > When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: > > class A { > static class B { > static String field; > static void test() { > String tmp = field; > new C(field); > } > } > > static class C { > static { > B.field = "Hello"; > } > > C(String val) { > if (val == null) { > throw new RuntimeException("Should not reach here"); > } > } > } > } > > Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. > > # Changes > > To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. > > # Benchmark Results > > Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. > > # Testing > > - [ ] [GHA](https://github.com/mhaessig/jdk/actions/runs/15560262225) > - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes > > # Acknowledgements > > Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix syntax error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25725/files - new: https://git.openjdk.org/jdk/pull/25725/files/f6a4a290..71b04b50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25725&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25725&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25725/head:pull/25725 PR: https://git.openjdk.org/jdk/pull/25725 From dfenacci at openjdk.org Thu Jun 12 09:05:30 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 12 Jun 2025 09:05:30 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 07:19:40 GMT, Manuel H?ssig wrote: > # Issue Summary > > Running > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. > > It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) > - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms > - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) Thanks for fixing this @mhaessig. I guess the issue cannot happen if `NonNMethodCodeHeapSize` is not given as a flag as it will be dynamically adapted to the number of compiler threads created, right? src/hotspot/share/compiler/compilationPolicy.cpp line 40: > 38: #include "runtime/deoptimization.hpp" > 39: #include "runtime/frame.hpp" > 40: #include "runtime/frame.inline.hpp" Was this removed on purpose? It isn't used directly but doesn't seem to be related to your change... ------------- PR Review: https://git.openjdk.org/jdk/pull/25770#pullrequestreview-2920153480 PR Review Comment: https://git.openjdk.org/jdk/pull/25770#discussion_r2142107994 From thartmann at openjdk.org Thu Jun 12 09:09:31 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 09:09:31 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v3] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 08:35:54 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: >> >> class A { >> static class B { >> static String field; >> static void test() { >> String tmp = field; >> new C(field); >> } >> } >> >> static class C { >> static { >> B.field = "Hello"; >> } >> >> C(String val) { >> if (val == null) { >> throw new RuntimeException("Should not reach here"); >> } >> } >> } >> } >> >> Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. >> >> # Changes >> >> To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. >> >> # Benchmark Results >> >> Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. >> >> # Testing >> >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605637901) >> - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes >> >> # Acknowledgements >> >> Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix syntax error Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25725#pullrequestreview-2920177669 From epeter at openjdk.org Thu Jun 12 09:12:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 09:12:03 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v10] In-Reply-To: References: Message-ID: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 79 additional commits since the last revision: - Merge branch 'master' into JDK-8347273-verify-IGVN-Ideal-Identity - update comments for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - reorder flags for Christian - max_modes - use stringStream instead of ttyLocker - assert(false) for Christian - rename for Christian - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Manuel H?ssig - review suggestions, and handled a few more edge cases - ... and 69 more: https://git.openjdk.org/jdk/compare/84e59324...d9546d87 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22970/files - new: https://git.openjdk.org/jdk/pull/22970/files/f54d851a..d9546d87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22970&range=08-09 Stats: 6953 lines in 245 files changed: 3281 ins; 3006 del; 666 mod Patch: https://git.openjdk.org/jdk/pull/22970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22970/head:pull/22970 PR: https://git.openjdk.org/jdk/pull/22970 From epeter at openjdk.org Thu Jun 12 09:20:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 09:20:22 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v4] In-Reply-To: References: Message-ID: > We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. > > This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). > > https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add hook example and comment for christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25643/files - new: https://git.openjdk.org/jdk/pull/25643/files/256d922c..55c6b22b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25643&range=02-03 Stats: 16 lines in 2 files changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25643/head:pull/25643 PR: https://git.openjdk.org/jdk/pull/25643 From chagedorn at openjdk.org Thu Jun 12 09:20:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 09:20:22 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v4] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 09:16:56 GMT, Emanuel Peter wrote: >> We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. >> >> This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). >> >> https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add hook example and comment for christian Thanks for adding the hint! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25643#pullrequestreview-2920208304 From epeter at openjdk.org Thu Jun 12 09:20:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 09:20:24 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v3] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 08:06:11 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions > > Otherwise, looks good! @chhagedorn I added a comment and example with the hook insertion :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25643#issuecomment-2965787186 From mhaessig at openjdk.org Thu Jun 12 09:35:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 09:35:19 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v2] In-Reply-To: References: Message-ID: > # Issue Summary > > Running > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. > > It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) > - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms > - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix inadvertantly removed header Co-developed-by: Damon Fenacci ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25770/files - new: https://git.openjdk.org/jdk/pull/25770/files/8d1b4f79..b62ef8ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25770&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25770&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25770/head:pull/25770 PR: https://git.openjdk.org/jdk/pull/25770 From mhaessig at openjdk.org Thu Jun 12 09:35:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 09:35:19 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 09:03:19 GMT, Damon Fenacci wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix inadvertantly removed header >> >> Co-developed-by: Damon Fenacci > > Thanks for fixing this @mhaessig. > I guess the issue cannot happen if `NonNMethodCodeHeapSize` is not given as a flag as it will be dynamically adapted to the number of compiler threads created, right? Thank you for having a look @dafedafe! > I guess the issue cannot happen if NonNMethodCodeHeapSize is not given as a flag as it will be dynamically adapted to the number of compiler threads created, right? That is right. In this case, the heuristics in `CodeCache::initialize_heaps()` take over. > src/hotspot/share/compiler/compilationPolicy.cpp line 40: > >> 38: #include "runtime/deoptimization.hpp" >> 39: #include "runtime/flags/debug_globals.hpp" >> 40: #include "runtime/frame.hpp" > > Was this removed on purpose? It isn't used directly but doesn't seem to be related to your change... Thank you for pointing this out. No, this is a blunder on my part. I fixed it in b62ef8a. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25770#issuecomment-2965861219 PR Review Comment: https://git.openjdk.org/jdk/pull/25770#discussion_r2142185964 From epeter at openjdk.org Thu Jun 12 10:28:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 10:28:14 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types Message-ID: I would like to add primitive type support to the template framework library. In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. Original experiments from here: https://github.com/openjdk/jdk/pull/23418 ------------- Commit messages: - parser refactor - rm previous changes - rm unnecessary file - documentation - improve test - wip test - size and boxing - update tests - add test, does not compile now - types - ... and 1 more: https://git.openjdk.org/jdk/compare/248341d3...08b9f674 Changes: https://git.openjdk.org/jdk/pull/25672/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358772 Stats: 633 lines in 7 files changed: 593 ins; 35 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25672/head:pull/25672 PR: https://git.openjdk.org/jdk/pull/25672 From dnsimon at openjdk.org Thu Jun 12 10:58:32 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 12 Jun 2025 10:58:32 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v3] In-Reply-To: <2ATJAQCD4VUVZ02RO2czrh4D40N0y4TR7XDRAhQ-PWE=.1bc8747d-0e2d-435c-a647-a8c718f6fcdc@github.com> References: <2ATJAQCD4VUVZ02RO2czrh4D40N0y4TR7XDRAhQ-PWE=.1bc8747d-0e2d-435c-a647-a8c718f6fcdc@github.com> Message-ID: <5SD6o4fpHcmxJnegKO-tp8H3N86_2DHJy0uvA5URD_o=.7f6a3999-cda0-497a-8dcf-a2548e46fada@github.com> On Thu, 12 Jun 2025 00:27:55 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring: move changes to HotSpotNmethod class; remove enum on Java side; rename field to invalidationReason src/hotspot/share/jvmci/jvmciEnv.cpp line 1747: > 1745: } > 1746: set_InstalledCode_address(installed_code, (jlong) cb); > 1747: set_HotSpotNmethod_invalidationReason(installed_code, static_cast(nmethod::ChangeReason::unknown)); Remove this - just let the `HotSpotNmethod` constructor do the right thing. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotNmethod.java line 81: > 79: > 80: /** > 81: * Identify the reason that caused this nmethod to be invalidated. Update doc to note that the field will be `-1` until the nmethod is invalidated. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotNmethod.java line 91: > 89: boolean inOopsTable = !IS_IN_NATIVE_IMAGE && !isDefault; > 90: this.compileIdSnapshot = inOopsTable ? 0L : compileId; > 91: this.invalidationReason = 0; It should be initialized to `-1` as we will never install invalid code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2142352178 PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2142348968 PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2142347083 From dnsimon at openjdk.org Thu Jun 12 11:03:32 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 12 Jun 2025 11:03:32 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v3] In-Reply-To: <2ATJAQCD4VUVZ02RO2czrh4D40N0y4TR7XDRAhQ-PWE=.1bc8747d-0e2d-435c-a647-a8c718f6fcdc@github.com> References: <2ATJAQCD4VUVZ02RO2czrh4D40N0y4TR7XDRAhQ-PWE=.1bc8747d-0e2d-435c-a647-a8c718f6fcdc@github.com> Message-ID: On Thu, 12 Jun 2025 00:27:55 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring: move changes to HotSpotNmethod class; remove enum on Java side; rename field to invalidationReason src/hotspot/share/code/nmethod.hpp line 476: > 474: // If you change anything in this enum please patch > 475: // vmStructs_jvmci.cpp accordingly. > 476: enum class ChangeReason : u1 { I would also suggest this enum be changed to `InvalidationReason` but that can be considered as a potential follow up. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotNmethod.java line 213: > 211: * @return a String describing the reason why this nmethod was invalidated. > 212: */ > 213: public String getInvalidationReasonString() { I would have stuck with getInvalidationReasonDescription but can live with getInvalidationReasonString. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2142359399 PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2142360949 From mhaessig at openjdk.org Thu Jun 12 11:04:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 11:04:29 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types In-Reply-To: References: Message-ID: <4o25kzqQsbiZGMI45xGn7QZtUwhBWPXQC6Z6Fbwq0IQ=.d510b35e-a28d-4ad4-8a40-3a7b298f34d6@github.com> On Fri, 6 Jun 2025 13:25:48 GMT, Emanuel Peter wrote: > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 Another great addition to the Template Framework! Thank you for your continued effort, @eme64. The broad strokes look good to me, but I have some remarks about a few details. test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 35: > 33: * additional functionality for code generation. These types with their extended > 34: * functionality can be used with many other code generation facilities in the > 35: * lbrary, such as generating random {@code Expression}s. Suggestion: * library, such as generating random {@code Expression}s. Typo nit test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 71: > 69: @Override > 70: public boolean isSubtypeOf(DataName.Type other) { > 71: return (other instanceof PrimitiveType pt) && pt.kind == kind; Perhaps it would be useful to implement the primitive type subtyping rules from [JLS ?4.10.1](https://docs.oracle.com/javase/specs/jls/se24/html/jls-4.html#jls-4.10.1). I can imagine that it might help generating more diverse programs with random variables of random primitive types. That might help with fuzzing IGVN optimizations on ranges? test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 148: > 146: * @return true iff the type is a floating type. > 147: */ > 148: public boolean isFloating() { Suggestion: /** * Indicates if the type is a floating point type. * * @return true iff the type is a floating point type. */ public boolean isFloating() { Feel free to ignore: I would have called this `isFp()` or something like that, because my brain associates "floating" by itself as someting that is not pinned in place. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 73: > 71: > 72: // p.xyz.InnerTest.main(); > 73: comp.invoke("p.xyz.InnerTest", "main", new Object[] {}); Personally, I would remove the comments here, since this is not a tutorial about the compile framework and the code is self-explanatory (the methods match the comments pretty well). test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 76: > 74: } > 75: > 76: // Generate a source Java file as String Suggestion: // Generate a Java source file as String This way it's consistent with the comments above. ------------- Changes requested by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25672#pullrequestreview-2920480403 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142310785 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142329195 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142341594 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142358396 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142345363 From chagedorn at openjdk.org Thu Jun 12 11:04:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 11:04:36 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types In-Reply-To: References: Message-ID: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> On Fri, 6 Jun 2025 13:25:48 GMT, Emanuel Peter wrote: > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 Nice additions to the library! Lot of minor comments, otherwise, looks good. test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 29: > 27: import java.util.ArrayList; > 28: import java.util.List; > 29: The imports can now be removed (cannot make a suggestion, it's hidden). test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 35: > 33: * additional functionality for code generation. These types with their extended > 34: * functionality can be used with many other code generation facilities in the > 35: * lbrary, such as generating random {@code Expression}s. Suggestion: * library, such as generating random {@code Expression}s. test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 45: > 43: * @return A random constant value. > 44: */ > 45: public Object con(); You can remove all the public identifies since the methods are implicitly public. test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 106: > 104: * List of all {@link PrimitiveType}s. > 105: */ > 106: public static final List PRIMITIVE_TYPES = List.of( Same with the constants: They are implicitly public static final. test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 52: > 50: private static enum Kind { BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, BOOLEAN }; > 51: > 52: // We have one static instance each, so we do not have duplicat instances. Suggestion: // We have one static instance each, so we do not have duplicated instances. test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 76: > 74: @Override > 75: public String name() { > 76: return switch(kind) { Suggestion: return switch (kind) { test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 94: > 92: > 93: public Object con() { > 94: return switch(kind) { Suggestion: return switch (kind) { test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 102: > 100: case FLOAT -> GEN_FLOAT.next(); > 101: case DOUBLE -> GEN_DOUBLE.next(); > 102: case BOOLEAN -> RANDOM.nextInt() % 2 == 0; You can use: Suggestion: case BOOLEAN -> RANDOM.nextBoolean(); test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 113: > 111: */ > 112: public int byteSize() { > 113: return switch(kind) { Suggestion: return switch (kind) { test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 120: > 118: case LONG -> 8; > 119: case FLOAT -> 4; > 120: case DOUBLE -> 8; Up to you if you want to merge cases like that for more compactness: Suggestion: case BYTE -> 1; case SHORT, CHAR -> 2; case INT, FLOAT -> 4; case LONG, DOUBLE -> 8; test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 131: > 129: */ > 130: public String boxedTypeName() { > 131: return switch(kind) { Suggestion: return switch (kind) { test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 149: > 147: */ > 148: public boolean isFloating() { > 149: return switch(kind) { Suggestion: return switch (kind) { test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 157: > 155: case FLOAT -> true; > 156: case DOUBLE -> true; > 157: case BOOLEAN -> false; Same here, up to you: Suggestion: case BYTE, SHORT, CHAR, INT, LONG, BOOLEAN -> false; case FLOAT, DOUBLE -> true; test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 42: > 40: > 41: import compiler.lib.compile_framework.*; > 42: import compiler.lib.template_framework.DataName; Unused: Suggestion: test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 50: > 48: import static compiler.lib.template_framework.Template.$; > 49: import static compiler.lib.template_framework.Template.addDataName; > 50: import static compiler.lib.template_framework.Template.dataNames; Unused: Suggestion: test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 79: > 77: public static String generate() { > 78: // Generate a list of test methods. > 79: Map tests = new HashMap<>(); Suggestion: Map tests = new HashMap<>(); test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 166: > 164: CodeGenerationDataNameType.PRIMITIVE_TYPES.stream().map(type -> > 165: sampleTemplate.asToken(type) > 166: ).toList() Can be simplified to: Suggestion: CodeGenerationDataNameType.PRIMITIVE_TYPES.stream().map(sampleTemplate::asToken).toList() ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25672#pullrequestreview-2920519548 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142332525 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142333995 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142335015 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142336679 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142341330 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142344920 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142345127 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142343704 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142345326 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142347722 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142349338 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142350538 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142351955 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142353227 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142353655 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142356649 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142359970 From chagedorn at openjdk.org Thu Jun 12 11:05:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 11:05:28 GMT Subject: [jdk25] RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 06:44:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [c98dffa1](https://github.com/openjdk/jdk/commit/c98dffa186d48c41e76fd3a60e0129a8da60310f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 11 Jun 2025 and was reviewed by Emanuel Peter and Sandhya Viswanathan. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25769#pullrequestreview-2920571340 From thartmann at openjdk.org Thu Jun 12 11:14:34 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 11:14:34 GMT Subject: [jdk25] RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 06:44:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [c98dffa1](https://github.com/openjdk/jdk/commit/c98dffa186d48c41e76fd3a60e0129a8da60310f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 11 Jun 2025 and was reviewed by Emanuel Peter and Sandhya Viswanathan. > > Thanks! Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25769#issuecomment-2966200598 From thartmann at openjdk.org Thu Jun 12 11:14:35 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Jun 2025 11:14:35 GMT Subject: [jdk25] Integrated: 8357982: Fix several failing BMI tests with -XX:+UseAPX In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 06:44:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [c98dffa1](https://github.com/openjdk/jdk/commit/c98dffa186d48c41e76fd3a60e0129a8da60310f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jatin Bhateja on 11 Jun 2025 and was reviewed by Emanuel Peter and Sandhya Viswanathan. > > Thanks! This pull request has now been integrated. Changeset: 839a91e1 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/839a91e14b3d11f0baddcaff5eb98c1ebccd44f1 Stats: 122 lines in 9 files changed: 108 ins; 0 del; 14 mod 8357982: Fix several failing BMI tests with -XX:+UseAPX Reviewed-by: chagedorn Backport-of: c98dffa186d48c41e76fd3a60e0129a8da60310f ------------- PR: https://git.openjdk.org/jdk/pull/25769 From mhaessig at openjdk.org Thu Jun 12 11:16:12 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 11:16:12 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition Message-ID: This PR performs some cleanup and formatting around the phase definitions in C2: - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), - the phases are reordered to be more or less in the order of execution or occurrence in the code, - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, - `CompilePhase.java` is aligned for better readability. This change was tested on: - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) - [x] tier1 plus some Oracle internal testing ------------- Commit messages: - Align CompilePhase.java - Sort phases approximately - Synchronize phase definitions - Capitalize phase description in MLA title case Changes: https://git.openjdk.org/jdk/pull/25778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354196 Stats: 149 lines in 2 files changed: 28 ins; 27 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/25778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25778/head:pull/25778 PR: https://git.openjdk.org/jdk/pull/25778 From epeter at openjdk.org Thu Jun 12 11:19:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:19:24 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v2] In-Reply-To: References: Message-ID: <7QK74MXw3WwKvoggzmg5hhFTzk6SgN0eA_DtYpGgeFs=.bc639a9e-b5b0-4581-91bb-5563c616f128@github.com> > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Christian Hagedorn Co-authored-by: Manuel H?ssig - Apply suggestions from code review Co-authored-by: Christian Hagedorn Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25672/files - new: https://git.openjdk.org/jdk/pull/25672/files/08b9f674..4c8d6312 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=00-01 Stats: 31 lines in 3 files changed: 0 ins; 13 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/25672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25672/head:pull/25672 PR: https://git.openjdk.org/jdk/pull/25672 From epeter at openjdk.org Thu Jun 12 11:19:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:19:26 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v2] In-Reply-To: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> References: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> Message-ID: <8fpQgh_zfbwwCtCljkzPqCuYKUDz0lfw9ufEqICFW7g=.24cd4822-6b6d-4a63-b0b8-a5ca1ae7a188@github.com> On Thu, 12 Jun 2025 10:46:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> Co-authored-by: Manuel H?ssig >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> Co-authored-by: Manuel H?ssig > > test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 29: > >> 27: import java.util.ArrayList; >> 28: import java.util.List; >> 29: > > The imports can now be removed (cannot make a suggestion, it's hidden). done > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 50: > >> 48: import static compiler.lib.template_framework.Template.$; >> 49: import static compiler.lib.template_framework.Template.addDataName; >> 50: import static compiler.lib.template_framework.Template.dataNames; > > Unused: > Suggestion: it is a duplicate actually! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142384146 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142392171 From epeter at openjdk.org Thu Jun 12 11:23:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:23:54 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> References: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> Message-ID: On Thu, 12 Jun 2025 10:47:56 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions applied > > test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 45: > >> 43: * @return A random constant value. >> 44: */ >> 45: public Object con(); > > You can remove all the public identifies since the methods are implicitly public. good point, done :) > test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 106: > >> 104: * List of all {@link PrimitiveType}s. >> 105: */ >> 106: public static final List PRIMITIVE_TYPES = List.of( > > Same with the constants: They are implicitly public static final. done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142406305 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142406768 From epeter at openjdk.org Thu Jun 12 11:23:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:23:55 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: <4o25kzqQsbiZGMI45xGn7QZtUwhBWPXQC6Z6Fbwq0IQ=.d510b35e-a28d-4ad4-8a40-3a7b298f34d6@github.com> References: <4o25kzqQsbiZGMI45xGn7QZtUwhBWPXQC6Z6Fbwq0IQ=.d510b35e-a28d-4ad4-8a40-3a7b298f34d6@github.com> Message-ID: On Thu, 12 Jun 2025 10:58:50 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions applied > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestPrimitiveTypes.java line 73: > >> 71: >> 72: // p.xyz.InnerTest.main(); >> 73: comp.invoke("p.xyz.InnerTest", "main", new Object[] {}); > > Personally, I would remove the comments here, since this is not a tutorial about the compile framework and the code is self-explanatory (the methods match the comments pretty well). It is still an example, so I'll leave it. It doesn't hurt. But I may drop them in the future :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142411181 From epeter at openjdk.org Thu Jun 12 11:23:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:23:54 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: References: Message-ID: > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: review suggestions applied ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25672/files - new: https://git.openjdk.org/jdk/pull/25672/files/4c8d6312..a1c84e80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=01-02 Stats: 18 lines in 2 files changed: 0 ins; 4 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25672/head:pull/25672 PR: https://git.openjdk.org/jdk/pull/25672 From chagedorn at openjdk.org Thu Jun 12 11:33:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 11:33:29 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:23:54 GMT, Emanuel Peter wrote: >> I would like to add primitive type support to the template framework library. >> >> In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. >> >> I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. >> >> Original experiments from here: https://github.com/openjdk/jdk/pull/23418 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions applied Marked as reviewed by chagedorn (Reviewer). test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 106: > 104: * List of all {@link PrimitiveType}s. > 105: */ > 106: static final List PRIMITIVE_TYPES = List.of( `static final` can also be removed for constants :-) Otherwise, looks good, thanks for the update! ------------- PR Review: https://git.openjdk.org/jdk/pull/25672#pullrequestreview-2920689284 PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142444561 From epeter at openjdk.org Thu Jun 12 11:33:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:33:30 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> References: <20w4l8kt-plllhS4vj7CJE2v2SDnUH7LFCgfGelKhcQ=.927816ee-0001-43bd-a5b2-da32d4c5968f@github.com> Message-ID: On Thu, 12 Jun 2025 11:01:22 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions applied > > Nice additions to the library! Lot of minor comments, otherwise, looks good. @chhagedorn @mhaessig Thanks for the extremely quick reviews and the good suggestions. I addressed them all :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25672#issuecomment-2966258912 From epeter at openjdk.org Thu Jun 12 11:33:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:33:31 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: <4o25kzqQsbiZGMI45xGn7QZtUwhBWPXQC6Z6Fbwq0IQ=.d510b35e-a28d-4ad4-8a40-3a7b298f34d6@github.com> References: <4o25kzqQsbiZGMI45xGn7QZtUwhBWPXQC6Z6Fbwq0IQ=.d510b35e-a28d-4ad4-8a40-3a7b298f34d6@github.com> Message-ID: On Thu, 12 Jun 2025 10:44:32 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions applied > > test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 71: > >> 69: @Override >> 70: public boolean isSubtypeOf(DataName.Type other) { >> 71: return (other instanceof PrimitiveType pt) && pt.kind == kind; > > Perhaps it would be useful to implement the primitive type subtyping rules from [JLS ?4.10.1](https://docs.oracle.com/javase/specs/jls/se24/html/jls-4.html#jls-4.10.1). I can imagine that it might help generating more diverse programs with random variables of random primitive types. That might help with fuzzing IGVN optimizations on ranges? Very nice idea :) I'll file a separate RFE for this. It is not a prime feature I need now, but it would be a good extension. The user can still get exact behavior with `exactOf`, but also subtype behavior with `subtypeOf`. Nice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142433861 From mchevalier at openjdk.org Thu Jun 12 11:41:31 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 12 Jun 2025 11:41:31 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition In-Reply-To: References: Message-ID: <9C_F58p4MNaffAvV8oL9Jk7STUCH9edl_MUFfZa5u9I=.a9535c4c-9caf-4e16-974f-b89cb49e695b@github.com> On Thu, 12 Jun 2025 11:10:38 GMT, Manuel H?ssig wrote: > This PR performs some cleanup and formatting around the phase definitions in C2: > - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), > - the phases are reordered to be more or less in the order of execution or occurrence in the code, > - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, > - `CompilePhase.java` is aligned for better readability. > > This change was tested on: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) > - [x] tier1 plus some Oracle internal testing More consistent, nice! Everything is still there, or seems to make sense (like I couldn't find anything lost while moving around). Not very sure of the order of phases, but it seems good. Tip for other reviewers: commit by commit is easier and clearer in this PR. ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/25778#pullrequestreview-2920742632 From epeter at openjdk.org Thu Jun 12 11:42:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:42:26 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v4] In-Reply-To: References: Message-ID: > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm static final ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25672/files - new: https://git.openjdk.org/jdk/pull/25672/files/a1c84e80..c39d4d6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25672/head:pull/25672 PR: https://git.openjdk.org/jdk/pull/25672 From epeter at openjdk.org Thu Jun 12 11:42:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:42:30 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v3] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:28:58 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions applied > > test/hotspot/jtreg/compiler/lib/template_framework/library/CodeGenerationDataNameType.java line 106: > >> 104: * List of all {@link PrimitiveType}s. >> 105: */ >> 106: static final List PRIMITIVE_TYPES = List.of( > > `static final` can also be removed for constants :-) Otherwise, looks good, thanks for the update! Oh right! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142470659 From epeter at openjdk.org Thu Jun 12 11:42:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:42:34 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v4] In-Reply-To: References: <4o25kzqQsbiZGMI45xGn7QZtUwhBWPXQC6Z6Fbwq0IQ=.d510b35e-a28d-4ad4-8a40-3a7b298f34d6@github.com> Message-ID: On Thu, 12 Jun 2025 11:26:31 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java line 71: >> >>> 69: @Override >>> 70: public boolean isSubtypeOf(DataName.Type other) { >>> 71: return (other instanceof PrimitiveType pt) && pt.kind == kind; >> >> Perhaps it would be useful to implement the primitive type subtyping rules from [JLS ?4.10.1](https://docs.oracle.com/javase/specs/jls/se24/html/jls-4.html#jls-4.10.1). I can imagine that it might help generating more diverse programs with random variables of random primitive types. That might help with fuzzing IGVN optimizations on ranges? > > Very nice idea :) > I'll file a separate RFE for this. It is not a prime feature I need now, but it would be a good extension. > The user can still get exact behavior with `exactOf`, but also subtype behavior with `subtypeOf`. Nice! Filed: [JDK-8359335](https://bugs.openjdk.org/browse/JDK-8359335) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25672#discussion_r2142467085 From mchevalier at openjdk.org Thu Jun 12 11:43:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 12 Jun 2025 11:43:42 GMT Subject: RFR: 8359121: C2: Region added by vectorizedMismatch intrinsic can survive as a dead node after IGVN In-Reply-To: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> References: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> Message-ID: <0RLmkluz50j7V3Qiv_lI13vLt_UuuYhVGf_JerIW96g=.9e374b92-a7b2-4240-853e-9fd9116e39bb@github.com> On Wed, 11 Jun 2025 11:35:28 GMT, Marc Chevalier wrote: > In `bool LibraryCallKit::inline_vectorizedMismatch()` the region created at: > > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/library_call.cpp#L6502 > > may have only one input, and be a copy (that is no self-loop) and a single input. It is thus safe to remove. Yet, in the reproducer case, the node is short-circuited, but stays in the graph after IGVN. > > Left, after Parsing/before IGVN; right, after IGVN: > > On the left, the ? is there because the Region doesn't have a self loop, which is expected for copies. On the right, it still doesn't have a self-loop, but IGV is also complaining the Region has no successor. > > This transformation comes from `IfNode::Ideal`, that calls `IfNode::Ideal_common`, that calls `Node::remove_dead_region`, that shortcuts a trivial Region input: > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/node.cpp#L1480-L1484 > > Yet, the Region node is never enqueued for IGVN, so stays in the graph. This is not nice because it gives both: > - a control node (50 Proj) has 2 successors, > - a control node (46 Region) has 0 successors. > > While at the end of IGVN, we could expect the graph to be cleaned up. There are couple ways of doing, each being enough by itself: > 1. explicitly record for IGVN the region node in `LibraryCallKit::inline_vectorizedMismatch`, and not hoping it would be collected by another consequence > 2. change `Node::remove_dead_region` to use `set_req_X` instead of `set_req`, so that if the Region goes dead, it will be removed. > 3. not introduce the region node in `LibraryCallKit::inline_vectorizedMismatch` if we are going to have only one path, and thus avoid the problem entirely. > > The solution 3. is really not easy and would require quite some code restructuring for simply saving removing a node. After discussing with @chhagedorn, we concluded that the solution 1. was probably the best: > - usually, functions similar to `inline_vectorizedMismatch` call `record_for_igvn` themselves, > - unlike what I assumed at first seeing `remove_dead_region`, it's not a general problem at all: I couldn't find another case without using `inline_vectorizedMismatch` where the Region is put aside, and not entirely disconnected quickly after, but the Regi... Thanks both for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25749#issuecomment-2966312178 From mchevalier at openjdk.org Thu Jun 12 11:43:43 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 12 Jun 2025 11:43:43 GMT Subject: Integrated: 8359121: C2: Region added by vectorizedMismatch intrinsic can survive as a dead node after IGVN In-Reply-To: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> References: <_wCd8NCYihFG3Sh7XRHPv4tYAQeMgbII-VXkDCNwoNE=.aafa7afa-49fd-48bb-aa87-0fc70a91c7bf@github.com> Message-ID: On Wed, 11 Jun 2025 11:35:28 GMT, Marc Chevalier wrote: > In `bool LibraryCallKit::inline_vectorizedMismatch()` the region created at: > > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/library_call.cpp#L6502 > > may have only one input, and be a copy (that is no self-loop) and a single input. It is thus safe to remove. Yet, in the reproducer case, the node is short-circuited, but stays in the graph after IGVN. > > Left, after Parsing/before IGVN; right, after IGVN: > > On the left, the ? is there because the Region doesn't have a self loop, which is expected for copies. On the right, it still doesn't have a self-loop, but IGV is also complaining the Region has no successor. > > This transformation comes from `IfNode::Ideal`, that calls `IfNode::Ideal_common`, that calls `Node::remove_dead_region`, that shortcuts a trivial Region input: > https://github.com/openjdk/jdk/blob/0582bd290d5a8b6344ae7ada36492cc2f33df050/src/hotspot/share/opto/node.cpp#L1480-L1484 > > Yet, the Region node is never enqueued for IGVN, so stays in the graph. This is not nice because it gives both: > - a control node (50 Proj) has 2 successors, > - a control node (46 Region) has 0 successors. > > While at the end of IGVN, we could expect the graph to be cleaned up. There are couple ways of doing, each being enough by itself: > 1. explicitly record for IGVN the region node in `LibraryCallKit::inline_vectorizedMismatch`, and not hoping it would be collected by another consequence > 2. change `Node::remove_dead_region` to use `set_req_X` instead of `set_req`, so that if the Region goes dead, it will be removed. > 3. not introduce the region node in `LibraryCallKit::inline_vectorizedMismatch` if we are going to have only one path, and thus avoid the problem entirely. > > The solution 3. is really not easy and would require quite some code restructuring for simply saving removing a node. After discussing with @chhagedorn, we concluded that the solution 1. was probably the best: > - usually, functions similar to `inline_vectorizedMismatch` call `record_for_igvn` themselves, > - unlike what I assumed at first seeing `remove_dead_region`, it's not a general problem at all: I couldn't find another case without using `inline_vectorizedMismatch` where the Region is put aside, and not entirely disconnected quickly after, but the Regi... This pull request has now been integrated. Changeset: b6ec93b0 Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/b6ec93b038c411d0c49be671c3b44dd231d01305 Stats: 58 lines in 2 files changed: 58 ins; 0 del; 0 mod 8359121: C2: Region added by vectorizedMismatch intrinsic can survive as a dead node after IGVN Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25749 From chagedorn at openjdk.org Thu Jun 12 11:45:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 11:45:30 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:10:38 GMT, Manuel H?ssig wrote: > This PR performs some cleanup and formatting around the phase definitions in C2: > - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), > - the phases are reordered to be more or less in the order of execution or occurrence in the code, > - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, > - `CompilePhase.java` is aligned for better readability. > > This change was tested on: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) > - [x] tier1 plus some Oracle internal testing Thanks for cleaning it up, looks good! src/hotspot/share/opto/phasetype.hpp line 54: > 52: flags(PHASEIDEAL_BEFORE_EA, "PhaseIdealLoop before EA") \ > 53: flags(AFTER_EA, "After Escape Analysis") \ > 54: flags(ITER_GVN_AFTER_EA, "Iter GVN after EA") \ Not sure if this should be also part of this change but we might want to consider "Iter GVN" -> IGVN (same for left hand sides of `flags()`. src/hotspot/share/opto/phasetype.hpp line 100: > 98: flags(OPTIMIZE_FINISHED, "Optimize Finished") \ > 99: flags(BEFORE_MATCHING, "Before Matching") \ > 100: flags(MATCHING, "After Matching") \ Could be name "AFTER_MATCHING" to match the name but as above, could also be part of a separate task. test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 73: > 71: AFTER_LOOP_UNROLLING( "After Loop Unrolling"), > 72: BEFORE_SPLIT_IF( "Before Split-if"), > 73: AFTER_SPLIT_IF( "After Split-if"), Nit: We might want to name it "Split-If" as being a name? At least we also call it as such in `TraceLoopOpts`: https://github.com/openjdk/jdk/blob/65e63b6ab4241fc9d683e2ffa5bfe6e1a30059b6/src/hotspot/share/opto/loopopts.cpp#L1465 (same for `phasetype.hpp`) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25778#pullrequestreview-2920735764 PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142479956 PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142483500 PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142495035 From chagedorn at openjdk.org Thu Jun 12 11:49:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 11:49:28 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:38:22 GMT, Christian Hagedorn wrote: >> This PR performs some cleanup and formatting around the phase definitions in C2: >> - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), >> - the phases are reordered to be more or less in the order of execution or occurrence in the code, >> - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, >> - `CompilePhase.java` is aligned for better readability. >> >> This change was tested on: >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) >> - [x] tier1 plus some Oracle internal testing > > src/hotspot/share/opto/phasetype.hpp line 100: > >> 98: flags(OPTIMIZE_FINISHED, "Optimize Finished") \ >> 99: flags(BEFORE_MATCHING, "Before Matching") \ >> 100: flags(MATCHING, "After Matching") \ > > Could be named "AFTER_MATCHING" to match the name but as above, could also be part of a separate task. Nvm, there is already [JDK-8319599](https://bugs.openjdk.org/browse/JDK-8319599) for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142515268 From duke at openjdk.org Thu Jun 12 11:54:09 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 12 Jun 2025 11:54:09 GMT Subject: RFR: 8357816: Add test from JDK-8350576 Message-ID: This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. Thanks! ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) - [ ] tier1 - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check Shout out to @TobiHartmann for helping out with jtreg ------------- Commit messages: - 8357816: Update test summary - 8357816: Update copyright - 8357816: Add jtreg test for bug 8350576 Changes: https://git.openjdk.org/jdk/pull/25774/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25774&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357816 Stats: 53 lines in 1 file changed: 53 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25774/head:pull/25774 PR: https://git.openjdk.org/jdk/pull/25774 From epeter at openjdk.org Thu Jun 12 11:54:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 11:54:32 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Tue, 10 Jun 2025 18:51:49 GMT, Roberto Casta?eda Lozano wrote: > Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. @robcasloz That would have also been my question. @rwestrel why did we omit those `Phi`s at the outer strip-mined loop in the first place? Is that not asking for all sorts of trouble and special handling? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2966381509 From jbhateja at openjdk.org Thu Jun 12 11:55:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 12 Jun 2025 11:55:15 GMT Subject: RFR: 8359327: Incorrect AVX3Threshold results into code buffer overflows on APX targets Message-ID: As per the latest architecture-instruction-set-extensions-programming-reference manual version 57[1] , upcoming Diamond Rapids server with APX feature has a different CPU family ID (19) than prior Xeons (6). Recently integrated EEVEX to REX2 demotion support with [JDK-8351994](https://bugs.openjdk.org/browse/JDK-8351994) already handles this through a newly defined _VM_Version::is_intel_server_family()_ API, but the existing AVX3Therehold setting is agnostic to this change, which causes code buffer overflows during arraycopy stubs generation. Patch fixes this issue and also appropriately increments final code buffer size to prevent buffer overruns during stub generation with non zero AVX3Thereshold. [1] https://www.intel.com/content/www/us/en/content-details/851355/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=intel%20architecture%20instruction%20set%20extensions%20programming%20reference ------------- Commit messages: - 8359327: Incorrect AVX3Threshold results into code buffer overflows on APX targets Changes: https://git.openjdk.org/jdk/pull/25780/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25780&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359327 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25780.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25780/head:pull/25780 PR: https://git.openjdk.org/jdk/pull/25780 From mhaessig at openjdk.org Thu Jun 12 11:58:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 11:58:29 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v4] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:42:26 GMT, Emanuel Peter wrote: >> I would like to add primitive type support to the template framework library. >> >> In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. >> >> I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. >> >> Original experiments from here: https://github.com/openjdk/jdk/pull/23418 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm static final Thank you for addressing our comments so quickly. Looks good! ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25672#pullrequestreview-2920833411 From mhaessig at openjdk.org Thu Jun 12 12:02:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 12:02:34 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:37:20 GMT, Christian Hagedorn wrote: >> This PR performs some cleanup and formatting around the phase definitions in C2: >> - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), >> - the phases are reordered to be more or less in the order of execution or occurrence in the code, >> - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, >> - `CompilePhase.java` is aligned for better readability. >> >> This change was tested on: >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) >> - [x] tier1 plus some Oracle internal testing > > src/hotspot/share/opto/phasetype.hpp line 54: > >> 52: flags(PHASEIDEAL_BEFORE_EA, "PhaseIdealLoop before EA") \ >> 53: flags(AFTER_EA, "After Escape Analysis") \ >> 54: flags(ITER_GVN_AFTER_EA, "Iter GVN after EA") \ > > Not sure if this should be also part of this change but we might want to consider "Iter GVN" -> IGVN (same for left hand sides of `flags()`. I will comment on JDK-8319599 to propose it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142552139 From epeter at openjdk.org Thu Jun 12 12:07:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 12:07:37 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:17:11 GMT, Roland Westrelin wrote: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. @rwestrel Thanks for looking into this! The fix seems reasonable given that we don't have phi's at the outer loop. But why don't we have those phis in the first place? src/hotspot/share/opto/loopnode.cpp line 3010: > 3008: Node* safepoint = outer_safepoint(); > 3009: Node* safepoint_mem = safepoint->in(TypeFunc::Memory); > 3010: if (safepoint_mem->is_MergeMem()) { I would have flipped the condition, and made an early exit condition from this. That way, the code is indented one level less. Just a suggestion, feel free to ignore :) src/hotspot/share/opto/loopnode.cpp line 3021: > 3019: first = mem; > 3020: mem = mem->in(MemNode::Memory); > 3021: } Suggestion: // Traverse up the chain of stores to find the first store pinned // at the loop exit projection. Node* last = mem; Node* first = nullptr; while (mem->is_Store() && mem->in(0) == cle_exit_proj) { DEBUG_ONLY(stores_in_outer_loop_cnt2++); first = mem; mem = mem->in(MemNode::Memory); } ------------- PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2920838151 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2142549734 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2142554505 From mchevalier at openjdk.org Thu Jun 12 12:08:28 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 12 Jun 2025 12:08:28 GMT Subject: RFR: 8357816: Add test from JDK-8350576 In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 09:08:24 GMT, Beno?t Maillard wrote: > This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. > > Thanks! > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) > - [ ] tier1 > - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check > > Shout out to @TobiHartmann for helping out with jtreg test/hotspot/jtreg/compiler/loopopts/LoopReductionHasControlOrBadInput.java line 29: > 27: * @summary Optimization bails out and hits an assert: > 28: * assert(false) failed: reduction has ctrl or bad vector_input > 29: * @run main/othervm -XX:CompileCommand=compileonly,compiler.loopopts.LoopReductionHasControlOrBadInput::* -Xbatch -XX:-TieredCompilation compiler.loopopts.LoopReductionHasControlOrBadInput Learning from @eme64: You should probably add another `@run` without parameters to catch more things in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25774#discussion_r2142563279 From mhaessig at openjdk.org Thu Jun 12 12:20:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 12:20:15 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition [v2] In-Reply-To: References: Message-ID: > This PR performs some cleanup and formatting around the phase definitions in C2: > - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), > - the phases are reordered to be more or less in the order of execution or occurrence in the code, > - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, > - `CompilePhase.java` is aligned for better readability. > > This change was tested on: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) > - [x] tier1 plus some Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Split-If is a proper name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25778/files - new: https://git.openjdk.org/jdk/pull/25778/files/54dbaabf..9f3676e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25778&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25778/head:pull/25778 PR: https://git.openjdk.org/jdk/pull/25778 From mhaessig at openjdk.org Thu Jun 12 12:20:16 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 12:20:16 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:41:36 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Split-If is a proper name > > test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 73: > >> 71: AFTER_LOOP_UNROLLING( "After Loop Unrolling"), >> 72: BEFORE_SPLIT_IF( "Before Split-if"), >> 73: AFTER_SPLIT_IF( "After Split-if"), > > Nit: We might want to name it "Split-If" as being a name? At least we also call it as such in `TraceLoopOpts`: > https://github.com/openjdk/jdk/blob/65e63b6ab4241fc9d683e2ffa5bfe6e1a30059b6/src/hotspot/share/opto/loopopts.cpp#L1465 > (same for `phasetype.hpp`) I like the proper name argument. Fixed in 9f3676e4a ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142583277 From mhaessig at openjdk.org Thu Jun 12 12:20:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 12:20:15 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition [v2] In-Reply-To: <9C_F58p4MNaffAvV8oL9Jk7STUCH9edl_MUFfZa5u9I=.a9535c4c-9caf-4e16-974f-b89cb49e695b@github.com> References: <9C_F58p4MNaffAvV8oL9Jk7STUCH9edl_MUFfZa5u9I=.a9535c4c-9caf-4e16-974f-b89cb49e695b@github.com> Message-ID: On Thu, 12 Jun 2025 11:38:42 GMT, Marc Chevalier wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Split-If is a proper name > > More consistent, nice! Everything is still there, or seems to make sense (like I couldn't find anything lost while moving around). Not very sure of the order of phases, but it seems good. > > Tip for other reviewers: commit by commit is easier and clearer in this PR. Thank you @marc-chevalier & @chhagedorn for your quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25778#issuecomment-2966472530 From duke at openjdk.org Thu Jun 12 12:25:55 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 12 Jun 2025 12:25:55 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: > This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. > > Thanks! > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) > - [ ] tier1 > - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check > > Shout out to @TobiHartmann for helping out with jtreg Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8357816: Add additional run without flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25774/files - new: https://git.openjdk.org/jdk/pull/25774/files/c6a8eb14..d946fcce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25774&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25774&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25774/head:pull/25774 PR: https://git.openjdk.org/jdk/pull/25774 From duke at openjdk.org Thu Jun 12 12:25:56 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 12 Jun 2025 12:25:56 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:05:42 GMT, Marc Chevalier wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8357816: Add additional run without flags > > test/hotspot/jtreg/compiler/loopopts/LoopReductionHasControlOrBadInput.java line 29: > >> 27: * @summary Optimization bails out and hits an assert: >> 28: * assert(false) failed: reduction has ctrl or bad vector_input >> 29: * @run main/othervm -XX:CompileCommand=compileonly,compiler.loopopts.LoopReductionHasControlOrBadInput::* -Xbatch -XX:-TieredCompilation compiler.loopopts.LoopReductionHasControlOrBadInput > > Learning from @eme64: You should probably add another `@run` without parameters to catch more things in the future. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25774#discussion_r2142598540 From chagedorn at openjdk.org Thu Jun 12 12:54:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 12:54:30 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:20:15 GMT, Manuel H?ssig wrote: >> This PR performs some cleanup and formatting around the phase definitions in C2: >> - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), >> - the phases are reordered to be more or less in the order of execution or occurrence in the code, >> - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, >> - `CompilePhase.java` is aligned for better readability. >> >> This change was tested on: >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) >> - [x] tier1 plus some Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Split-If is a proper name Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25778#pullrequestreview-2921022602 From chagedorn at openjdk.org Thu Jun 12 12:54:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 12 Jun 2025 12:54:32 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition [v2] In-Reply-To: References: Message-ID: <-f75lABRKLqyrRBiNg-gDYe1stxzxnLLpHmKXEJbKI8=.7c101781-80cf-4961-98ab-61c7995d7eb0@github.com> On Thu, 12 Jun 2025 11:58:58 GMT, Manuel H?ssig wrote: >> src/hotspot/share/opto/phasetype.hpp line 54: >> >>> 52: flags(PHASEIDEAL_BEFORE_EA, "PhaseIdealLoop before EA") \ >>> 53: flags(AFTER_EA, "After Escape Analysis") \ >>> 54: flags(ITER_GVN_AFTER_EA, "Iter GVN after EA") \ >> >> Not sure if this should be also part of this change but we might want to consider "Iter GVN" -> IGVN (same for left hand sides of `flags()`. > > I will comment on JDK-8319599 to propose it. Sounds good, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25778#discussion_r2142661389 From epeter at openjdk.org Thu Jun 12 13:04:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 13:04:39 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: <2I9GIwtq6eb0fq1Xnj90-MmLbtk56abhai3pn1NeiMA=.43dffbf3-8f67-4a85-8b73-7f5512be16b6@github.com> On Mon, 2 Jun 2025 10:57:22 GMT, Damon Fenacci wrote: > The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. > > There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). > > Running them **sequentially** should be OK and should avoid running out of memory. > > Testing: Tier1-3+ @dafedafe Thanks for looking into this, looks reasonable ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25582#pullrequestreview-2921053804 From rcastanedalo at openjdk.org Thu Jun 12 13:06:42 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Jun 2025 13:06:42 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:13:35 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Add comment to benchmark as to why we fix the heap size > - Add missing null chec > - Fix typos Thank you for your thorough work here Manuel, in particular for carefully exploring and discussing alternative solutions! The peephole approach looks good to me, I just have a few coding suggestions and test questions. src/hotspot/cpu/x86/peephole_x86_64.cpp line 240: > 238: > 239: // This function removes redundant lea instructions that result from chained dereferences that > 240: // match to leaPCompressedOopOffset, leaP8Narow, or leaP32Narrow. This happens for ideal graphs Suggestion: // match to leaPCompressedOopOffset, leaP8Narrow, or leaP32Narrow. This happens for ideal graphs src/hotspot/cpu/x86/peephole_x86_64.cpp line 267: > 265: // | / \ > 266: // leaP* MachProj (leaf) > 267: // In this case where the common parent of the leaP* and the decode is one MemToRegSpill Copy Suggestion: // In this case where the common parent of the leaP* and the decode is one MemToRegSpillCopy src/hotspot/cpu/x86/peephole_x86_64.cpp line 288: > 286: bool is_spill = lea_derived_oop->in(AddPNode::Address) != decode->in(1) && > 287: lea_derived_oop->in(AddPNode::Address)->is_SpillCopy() && > 288: decode->in(1)->is_SpillCopy(); The logic around `is_spill` could be simplified by declaring pointers to the lea and decode address inputs and setting these either to their direct inputs or one level up in case of spilling. Something like this: Node* lea_address = lea_derived_oop->in(AddPNode::Address); Node* decode_address = decode->in(1); bool is_spill = lea_address != decode_address && lea_address->is_SpillCopy() && decode_address->is_SpillCopy(); if (is_spill) { decode_address = decode_address->in(1); lea_address = lea_address->in(1); } // The leaP* and the decode must have the same parent. If we have a spill, they must have // the same grandparent. if (lea_address != decode_address) { return false; } (...) This creates more opportunities for simplification below, e.g. rewiring the base address of the affected leas directly to `lea_address` (or `decode_address`). src/hotspot/cpu/x86/peephole_x86_64.cpp line 297: > 295: } > 296: > 297: // Ensure the decode only has the leaP*s with the same (grand)parent and a MachProj leaf as children. For unambiguity: Suggestion: // Ensure the decode only has the leaP*s (with the same (grand)parent) and a MachProj leaf as children. src/hotspot/cpu/x86/peephole_x86_64.cpp line 325: > 323: // Ensure the MachProj is in the same block as the decode and the lea. > 324: if (proj == nullptr || !block->contains(proj)) { > 325: return false; The only scenario in which I think this may be possible is when we stress scheduling, otherwise `RFLAGS` projections should always be scheduled immediately after their input. Consider adding an assertion like `assert(StressGCM, "should be scheduled contiguously otherwise");` or similar here. src/hotspot/cpu/x86/peephole_x86_64.cpp line 349: > 347: // Remove spill for the decode if it does not have any other uses. > 348: if (is_spill && decode->in(1)->is_Mach() && decode->in(1)->outcnt() == 1 && block->contains(decode->in(1))) { > 349: MachNode* decode_spill = decode->in(1)->as_Mach(); This could be simplified if `decode_spill` was defined already within the `if (is_spill)` statement proposed above. Furthermore, `decode->in(1)->is_Mach()` is implied by `is_spill`, so not needed. src/hotspot/cpu/x86/x86_64.ad line 6378: > 6376: %} > 6377: > 6378: instruct cmovI_rReg_rReg_memUCF_ndd(rRegI dst, cmpOpUCF cop, rFlagsRegUCF cr, rRegI src1, memory src2) Please do not touch unrelated lines. We could remove trailing whitespaces in a separate cleanup RFE. src/hotspot/cpu/x86/x86_64.ad line 6761: > 6759: %} > 6760: > 6761: instruct cmovL_regUCF_ndd(rRegL dst, cmpOpUCF cop, rFlagsRegUCF cr, rRegL src1, rRegL src2) Same here. src/hotspot/cpu/x86/x86_64.ad line 6868: > 6866: %} > 6867: > 6868: instruct cmovL_rReg_rReg_memUCF_ndd(rRegL dst, cmpOpUCF cop, rFlagsRegUCF cr, rRegL src1, memory src2) Same here. test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 1: > 1: /* The new test cases are very thorough and well-documented, but the IR conditions checked are very specific, which might become a maintainability burden as library code, intrinsics, and platform definitions change. Is there anything that could be relaxed while still checking that the optimization is somehow applied (or not applied)? Two things that come to mind are 1) extracting standalone Java programs that trigger the same scenarios as the current library calls and 2) relaxing the IRNode definitions and the corresponding IR check preconditions (e.g. defining and matching a single `leaP.*` node instead of specialized versions for different heap configurations, or defining and matching a single `decodeHeapOop.*` node). test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 97: > 95: // The following tests ensure that we do not generate a redundant lea instruction on x86. > 96: // These get generated on chained dereferences for the rules leaPCompressedOopOffset, > 97: // leaP8Narow, and leaP32Narrow and stem from a decodeHeapOopNotNull that is not needed Suggestion: // leaP8Narrow, and leaP32Narrow and stem from a decodeHeapOopNotNull that is not needed test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 141: > 139: for (boolean negativeTest : new boolean[] {false, true}) { > 140: for (boolean compressedTest : new boolean[] {false, true}) { > 141: // leaPComperssedOopOffset leaP(8|32)Narrow Suggestion: // leaPCompressedOopOffset leaP(8|32)Narrow ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2920901718 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142587282 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142588194 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142604783 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142611102 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142620842 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142632453 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142642381 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142642593 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142642844 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142677949 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142637298 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142638189 From rcastanedalo at openjdk.org Thu Jun 12 13:06:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Jun 2025 13:06:43 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: On Wed, 4 Jun 2025 13:22:11 GMT, Manuel H?ssig wrote: >> src/hotspot/cpu/x86/peephole_x86_64.cpp line 244: >> >>> 242: // the DecodeN. However, after matching the DecodeN is added back as the base for the leaP*, >>> 243: // which is nessecary if the oop derived by the leaP* gets added to an OopMap, because OopMaps >>> 244: // cannot contain derived oops with narrow oops as a base. >> >> Am I correct to assume that if it is referenced in OopMap (which is side table) it will by referenced by some Safepoint node in graph? > > Exactly. This is why I can get away with only checking the usages of the decode. Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142650623 From mhaessig at openjdk.org Thu Jun 12 13:15:19 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 13:15:19 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v3] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply typo suggestions Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/67afb3ca..830b45e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From dbriemann at openjdk.org Thu Jun 12 13:34:31 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 12 Jun 2025 13:34:31 GMT Subject: RFR: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments [v2] In-Reply-To: References: Message-ID: <0PxrzfPDyOdmW_bZAI6Ve0f9NKpjwUhlFtIKaUygTM0=.aebf7b00-259b-4fb4-aa78-68da01e02505@github.com> On Wed, 11 Jun 2025 12:40:05 GMT, David Briemann wrote: >> Add missing sizes for some instructions. >> Clean up outdated Power7 comments. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > re-add deleted token Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25752#issuecomment-2966724502 From duke at openjdk.org Thu Jun 12 13:34:31 2025 From: duke at openjdk.org (duke) Date: Thu, 12 Jun 2025 13:34:31 GMT Subject: RFR: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 12:40:05 GMT, David Briemann wrote: >> Add missing sizes for some instructions. >> Clean up outdated Power7 comments. > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > re-add deleted token @dbriemann Your change (at version 315a7da59a882f668f186583e0f195d25c67c799) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25752#issuecomment-2966729083 From mhaessig at openjdk.org Thu Jun 12 14:09:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 14:09:41 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v3] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:42:26 GMT, Roberto Casta?eda Lozano wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply typo suggestions >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/cpu/x86/x86_64.ad line 6378: > >> 6376: %} >> 6377: >> 6378: instruct cmovI_rReg_rReg_memUCF_ndd(rRegI dst, cmpOpUCF cop, rFlagsRegUCF cr, rRegI src1, memory src2) > > Please do not touch unrelated lines. We could remove trailing whitespaces in a separate cleanup RFE. Reverted in febaf7b1f9a > src/hotspot/cpu/x86/x86_64.ad line 6761: > >> 6759: %} >> 6760: >> 6761: instruct cmovL_regUCF_ndd(rRegL dst, cmpOpUCF cop, rFlagsRegUCF cr, rRegL src1, rRegL src2) > > Same here. Reverted in febaf7b1f9a > src/hotspot/cpu/x86/x86_64.ad line 6868: > >> 6866: %} >> 6867: >> 6868: instruct cmovL_rReg_rReg_memUCF_ndd(rRegL dst, cmpOpUCF cop, rFlagsRegUCF cr, rRegL src1, memory src2) > > Same here. Reverted in febaf7b1f9a ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142866603 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142866907 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142867188 From epeter at openjdk.org Thu Jun 12 14:15:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 14:15:37 GMT Subject: RFR: 8358600: Template-Framework Library: Template for TestFramework test class [v3] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 12:07:44 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions > > Thank you for addressing my comments. Looks good to me! @mhaessig @chhagedorn Thanks for reviewing and for all the helpful suggestions ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25643#issuecomment-2966988695 From epeter at openjdk.org Thu Jun 12 14:15:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 14:15:38 GMT Subject: Integrated: 8358600: Template-Framework Library: Template for TestFramework test class In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 15:50:17 GMT, Emanuel Peter wrote: > We might want to write many IR/TestFramework tests, and so I would like to integrate a Template that generates the class, and the user has to only generate a list of tests. > > This is a first extension for https://github.com/openjdk/jdk/pull/24217. I had already prototyped it earlier and plan to use it in multiple tests https://github.com/openjdk/jdk/pull/23418 (see `IRTestClass.java`). > > https://github.com/openjdk/jdk/blob/dc640cbd8fb8ec76920a7ab52dfe7955ed1d77f2/test/hotspot/jtreg/compiler/lib/template_framework/library/TestFrameworkClass.java#L36-L45 This pull request has now been integrated. Changeset: b85fe02b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b85fe02be5966b72ea1a92bfb3faf088d310219a Stats: 282 lines in 2 files changed: 282 ins; 0 del; 0 mod 8358600: Template-Framework Library: Template for TestFramework test class Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25643 From epeter at openjdk.org Thu Jun 12 14:21:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 14:21:49 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v10] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 08:08:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 79 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8347273-verify-IGVN-Ideal-Identity >> - update comments for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - reorder flags for Christian >> - max_modes >> - use stringStream instead of ttyLocker >> - assert(false) for Christian >> - rename for Christian >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases >> - ... and 69 more: https://git.openjdk.org/jdk/compare/78bf9941...d9546d87 > > Update looks good, thanks! @chhagedorn @mhaessig Thank you for the reviews and all the helpful suggestions :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22970#issuecomment-2967020432 From epeter at openjdk.org Thu Jun 12 14:21:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 14:21:49 GMT Subject: Integrated: 8347273: C2: VerifyIterativeGVN for Ideal and Identity In-Reply-To: References: Message-ID: <4DpljUxScfOlta-_yDUbhiFPO3FPb7LkYhJcNjazwJ4=.b6ca86ff-551d-4f61-8fcf-322a3bff8464@github.com> On Wed, 8 Jan 2025 14:43:40 GMT, Emanuel Peter wrote: > **Past Work** > With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. > > **This PR** > I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. > > I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. > > My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. > > **Future Work:** > In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. > > I filed: > [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) > (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) > > Testing passed tier1-3, with extra timeout factor 20. This pull request has now been integrated. Changeset: dd688290 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/dd68829017c3adea4068d5311cab3fbef87b9577 Stats: 925 lines in 5 files changed: 900 ins; 0 del; 25 mod 8347273: C2: VerifyIterativeGVN for Ideal and Identity Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/22970 From mhaessig at openjdk.org Thu Jun 12 14:26:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 14:26:36 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:26:01 GMT, Roberto Casta?eda Lozano wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add comment to benchmark as to why we fix the heap size >> - Add missing null chec >> - Fix typos > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 288: > >> 286: bool is_spill = lea_derived_oop->in(AddPNode::Address) != decode->in(1) && >> 287: lea_derived_oop->in(AddPNode::Address)->is_SpillCopy() && >> 288: decode->in(1)->is_SpillCopy(); > > The logic around `is_spill` could be simplified by declaring pointers to the lea and decode address inputs and setting these either to their direct inputs or one level up in case of spilling. Something like this: > > > Node* lea_address = lea_derived_oop->in(AddPNode::Address); > Node* decode_address = decode->in(1); > > bool is_spill = lea_address != decode_address && > lea_address->is_SpillCopy() && > decode_address->is_SpillCopy(); > > if (is_spill) { > decode_address = decode_address->in(1); > lea_address = lea_address->in(1); > } > > // The leaP* and the decode must have the same parent. If we have a spill, they must have > // the same grandparent. > if (lea_address != decode_address) { > return false; > } > > (...) > > > > This creates more opportunities for simplification below, e.g. rewiring the base address of the affected leas directly to `lea_address` (or `decode_address`). That makes a lot of things much clearer. I like it! Thank you for the suggestion! Incorporated in 3d6f8972a58 > src/hotspot/cpu/x86/peephole_x86_64.cpp line 325: > >> 323: // Ensure the MachProj is in the same block as the decode and the lea. >> 324: if (proj == nullptr || !block->contains(proj)) { >> 325: return false; > > The only scenario in which I think this may be possible is when we stress scheduling, otherwise `RFLAGS` projections should always be scheduled immediately after their input. Consider adding an assertion like `assert(StressGCM, "should be scheduled contiguously otherwise");` or similar here. Interesting. I thought that this case would be unlikely, but did not know this. I added the assert in 0f464e131d4. > src/hotspot/cpu/x86/peephole_x86_64.cpp line 349: > >> 347: // Remove spill for the decode if it does not have any other uses. >> 348: if (is_spill && decode->in(1)->is_Mach() && decode->in(1)->outcnt() == 1 && block->contains(decode->in(1))) { >> 349: MachNode* decode_spill = decode->in(1)->as_Mach(); > > This could be simplified if `decode_spill` was defined already within the `if (is_spill)` statement proposed above. Furthermore, `decode->in(1)->is_Mach()` is implied by `is_spill`, so not needed. Incorporated in 3d6f8972a58 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142912020 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142906368 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142913342 From epeter at openjdk.org Thu Jun 12 14:29:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 14:29:13 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v5] In-Reply-To: References: Message-ID: > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8358772-TemplateFramework-types-operations-expressions - rm static final - review suggestions applied - Apply suggestions from code review Co-authored-by: Christian Hagedorn Co-authored-by: Manuel H?ssig - Apply suggestions from code review Co-authored-by: Christian Hagedorn Co-authored-by: Manuel H?ssig - parser refactor - rm previous changes - rm unnecessary file - documentation - improve test - ... and 6 more: https://git.openjdk.org/jdk/compare/dd688290...b07f752b ------------- Changes: https://git.openjdk.org/jdk/pull/25672/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25672&range=04 Stats: 624 lines in 7 files changed: 580 ins; 39 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25672/head:pull/25672 PR: https://git.openjdk.org/jdk/pull/25672 From dfenacci at openjdk.org Thu Jun 12 14:39:29 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 12 Jun 2025 14:39:29 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 09:35:19 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> Running >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. >> >> It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. >> >> # Testing >> >> - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) >> - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms >> - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix inadvertantly removed header > > Co-developed-by: Damon Fenacci Looks good to me. Thanks @mhaessig. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/25770#pullrequestreview-2921503211 From mhaessig at openjdk.org Thu Jun 12 14:46:25 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 14:46:25 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with four additional commits since the last revision: - Factor out address nodes for simplification - Add assert to codepath only reachable with stressing. - Rename for clarity Confused myself.... - Revert change to unrelated lines This reverts commit d1c6a653770bfe578b1982ac726b258fa08d57b8. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/830b45e7..3d6f8972 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=02-03 Stats: 21 lines in 2 files changed: 5 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Thu Jun 12 14:49:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 14:49:36 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:59:51 GMT, Roberto Casta?eda Lozano wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add comment to benchmark as to why we fix the heap size >> - Add missing null chec >> - Fix typos > > test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 1: > >> 1: /* > > The new test cases are very thorough and well-documented, but the IR conditions checked are very specific, which might become a maintainability burden as library code, intrinsics, and platform definitions change. Is there anything that could be relaxed while still checking that the optimization is somehow applied (or not applied)? > Two things that come to mind are 1) extracting standalone Java programs that trigger the same scenarios as the current library calls and 2) relaxing the IRNode definitions and the corresponding IR check preconditions (e.g. defining and matching a single `leaP.*` node instead of specialized versions for different heap configurations, or defining and matching a single `decodeHeapOop.*` node). The best way to get this to be maintainable would be the ability to say "between phase A and B I expect to see two fewer DecodeN's", which is currently not possible in the IR-Framework. I think that the number of `DecodeHeapOopOffset_not_null`s is currently the main maintenance burden. Matching `leaP*` Nodes is the next best option. The main reason I went for such specific IR conditions is to show that the peephole works for all `leaP*` variants. If that is not valuable enough, then this would get rid of all the heap size related cruft. The most finicky tests are `RegexFind` and `StringInflate`. These could be outright removed, because they mainly show interesting behavior if all VM intrinsics are disabled. I was also not able to extract these tests into standalone tests, despite some effort. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2142970431 From roland at openjdk.org Thu Jun 12 15:05:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:05:35 GMT Subject: RFR: 8358334: C2/Shenandoah: incorrect execution with Unsafe In-Reply-To: <7ZHTFi99uOLv_qY4m0bsX8UY9uP2J0NGpfMl6uLcsTE=.0c34fe48-5ee8-407c-b51c-1290099b6761@github.com> References: <7ZHTFi99uOLv_qY4m0bsX8UY9uP2J0NGpfMl6uLcsTE=.0c34fe48-5ee8-407c-b51c-1290099b6761@github.com> Message-ID: On Wed, 11 Jun 2025 00:18:09 GMT, Cesar Soares Lucas wrote: >> When a barrier is expanded, some control is picked as a location for >> the barrier. The control input of data nodes that depend on that >> control are updated so the nodes are after the expanded barrier unless >> the barrier itself depends on some of those nodes. >> >> In this particular failure, a raw memoy `Store` is the input memory to >> the barrier. That `Store` has an anti-dependent `Load`. All 3 nodes >> (barrier, `Load` and `Store`) are at the same control. The `Store` is >> an input to the barrier so it stays before the barrier. The `Load`'s >> control is updated to be after the barrier which breaks the >> anti-dependency. The bug is that the logic that sorts nodes that need >> to be before the barrier and those that can be after ignores >> anti-dependencies. The fix simply extends that logic to take them into >> account. > > LGTM. Thanks. @JohnTortugo @earthling-amzn @shipilev thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25729#issuecomment-2967202208 From roland at openjdk.org Thu Jun 12 15:05:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:05:36 GMT Subject: Integrated: 8358334: C2/Shenandoah: incorrect execution with Unsafe In-Reply-To: References: Message-ID: <-a0YpfCwzfGZ3VG1Pqgc6c_L1tk4-DPConFxZUu4Mr0=.dbef3b3e-f6b8-448a-b9de-d1b959acee33@github.com> On Tue, 10 Jun 2025 14:13:21 GMT, Roland Westrelin wrote: > When a barrier is expanded, some control is picked as a location for > the barrier. The control input of data nodes that depend on that > control are updated so the nodes are after the expanded barrier unless > the barrier itself depends on some of those nodes. > > In this particular failure, a raw memoy `Store` is the input memory to > the barrier. That `Store` has an anti-dependent `Load`. All 3 nodes > (barrier, `Load` and `Store`) are at the same control. The `Store` is > an input to the barrier so it stays before the barrier. The `Load`'s > control is updated to be after the barrier which breaks the > anti-dependency. The bug is that the logic that sorts nodes that need > to be before the barrier and those that can be after ignores > anti-dependencies. The fix simply extends that logic to take them into > account. This pull request has now been integrated. Changeset: 1fcede05 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/1fcede053cca360c96606c1034b2a365a4fada82 Stats: 153 lines in 3 files changed: 117 ins; 26 del; 10 mod 8358334: C2/Shenandoah: incorrect execution with Unsafe Reviewed-by: wkemper, shade ------------- PR: https://git.openjdk.org/jdk/pull/25729 From mhaessig at openjdk.org Thu Jun 12 15:11:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 15:11:34 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 14:46:25 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with four additional commits since the last revision: > > - Factor out address nodes for simplification > - Add assert to codepath only reachable with stressing. > - Rename for clarity > > Confused myself.... > - Revert change to unrelated lines > > This reverts commit d1c6a653770bfe578b1982ac726b258fa08d57b8. > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2967229328 From roland at openjdk.org Thu Jun 12 15:16:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:16:16 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v2] In-Reply-To: References: Message-ID: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/a0d7442c..c13cecdf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From roland at openjdk.org Thu Jun 12 15:16:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:16:16 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Tue, 10 Jun 2025 18:51:49 GMT, Roberto Casta?eda Lozano wrote: > Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. > > Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`. It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2967236228 From mhaessig at openjdk.org Thu Jun 12 15:18:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 12 Jun 2025 15:18:36 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: <8g5WrL-we24w2Qz2nuBeTvL0qxbpgES_gr1nKNAUnZc=.1c02c2b4-846e-45bb-a0c6-5c19be383d8e@github.com> On Thu, 12 Jun 2025 13:03:35 GMT, Roberto Casta?eda Lozano wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add comment to benchmark as to why we fix the heap size >> - Add missing null chec >> - Fix typos > > Thank you for your thorough work here Manuel, in particular for carefully exploring and discussing alternative solutions! The peephole approach looks good to me, I just have a few coding suggestions and test questions. @robcasloz, thank you for taking a look! I addressed your coding suggestions and provided some answers to your questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2967255435 From epeter at openjdk.org Thu Jun 12 15:25:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Jun 2025 15:25:29 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Thu, 12 Jun 2025 15:11:05 GMT, Roland Westrelin wrote: >> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. >> >> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. > >> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. >> >> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. > > I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`. > > It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit. @rwestrel Dropping the `Phi` means there are fewer nodes, and probably it is easier to let things float out from the inner loop. That probably works fine for data nodes, and loads that float up. But for sunk stores it doesn't work... at least not without your cleanup now. In some way, not having the `Phi` violates the assumption of the C2 IR. Namely that if you can change the value/memory during the loop, then you need a `Phi` to model that. That is a bit scary, to handle outer strip mined loops different... sure we want the old optimizations to still work, but we have no idea what "new" things now badly optimize, like in this case here. Any new optimization also has to be aware that in the case of strip mined loops, the absence of a phi does not mean there cannot be a mutation on that data/memory. Not great :/ Adding the `Phi`s in all cases would probably break some optimizations, as you say. Maybe we would have to add dedicated `skip_strip_mining` logic all over the place. It would be difficult to know where we are missing them. That's not great either :/ It's a difficult trade-off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2967280652 From dbriemann at openjdk.org Thu Jun 12 15:30:35 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 12 Jun 2025 15:30:35 GMT Subject: Integrated: 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments In-Reply-To: References: Message-ID: <3wR8GLMMA4teu9p1eFJNWUlnzbbJjtFlZeEMQiteNWs=.00188002-9918-4af9-8ec1-505f434e31ff@github.com> On Wed, 11 Jun 2025 12:30:00 GMT, David Briemann wrote: > Add missing sizes for some instructions. > Clean up outdated Power7 comments. This pull request has now been integrated. Changeset: 3c53057f Author: David Briemann Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/3c53057fa63e0f8bf3634e4286fe2085d2f4ee9e Stats: 29 lines in 1 file changed: 10 ins; 19 del; 0 mod 8359232: [PPC64] C2: Clean up ppc.ad: add instr sizes, remove comments Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/25752 From roland at openjdk.org Thu Jun 12 15:38:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:38:53 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v3] In-Reply-To: References: Message-ID: <6BFidU0d9xMpaKNH-AwBACbp4qZRTAVIf8IxE1uMPQE=.d619bf69-4ce2-4c01-a15a-5226e43d0970@github.com> > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - reviews - Merge branch 'master' into JDK-8356708 - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Roberto Casta?eda Lozano - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/c13cecdf..2d1b1096 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=01-02 Stats: 34042 lines in 760 files changed: 26195 ins; 5045 del; 2802 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From roland at openjdk.org Thu Jun 12 15:38:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:38:53 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Wed, 11 Jun 2025 11:30:32 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. >> >> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. > >> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. > > Test results (tier1-5 in Oracle's internal test system) look good. Thanks for the reviews @robcasloz @eme64 Change is read for another pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2967328929 From roland at openjdk.org Thu Jun 12 15:38:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:38:53 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v3] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:57:38 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - reviews >> - Merge branch 'master' into JDK-8356708 >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Emanuel Peter >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - test & fix > > src/hotspot/share/opto/loopnode.cpp line 3010: > >> 3008: Node* safepoint = outer_safepoint(); >> 3009: Node* safepoint_mem = safepoint->in(TypeFunc::Memory); >> 3010: if (safepoint_mem->is_MergeMem()) { > > I would have flipped the condition, and made an early exit condition from this. That way, the code is indented one level less. Just a suggestion, feel free to ignore :) Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2143081944 From roland at openjdk.org Thu Jun 12 15:38:53 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:38:53 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v3] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 14:13:33 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - reviews >> - Merge branch 'master' into JDK-8356708 >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Emanuel Peter >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - test & fix > > test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java line 1: > >> 1: /* > > It would be good, for completeness, to add a "Couple stores sunk in outer loop, store in inner loop" test. Done in the new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2143082613 From roland at openjdk.org Thu Jun 12 15:42:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:42:35 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> Message-ID: On Mon, 9 Jun 2025 06:54:34 GMT, Roberto Casta?eda Lozano wrote: > I think it would be good (although not necessarily in the context of this PR) to establish the "no duplicate memory projection" invariant in the back-end, for sanity and to make sure we do not break any logic that might be implicitly relying on it. If you agree, could you file a follow-up RFE, ideally with a reproducer where the current logic fails to remove `NarrowMemProj`s? One way would be to simply assert that there's no `NarrowMemProj`s left during final graph reshape. Is that what you'd like? Stepping back, what's the concern here? The new projections should mostly be harmless. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2967339435 From roland at openjdk.org Thu Jun 12 15:43:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Jun 2025 15:43:46 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> Message-ID: On Tue, 27 May 2025 07:54:33 GMT, Emanuel Peter wrote: >>> A few minor last comments but otherwise, it looks good to me! Thanks for all the updates, the patience and the credit! >> >> Thanks for the careful review. I applied your suggestions. > > @rwestrel Let me know if you want us to run some extra testing. Christian said that you might be planning to wait until the JDK26 fork, and merge then, and then we can run testing. Up to you :) @eme64 in case you forgot about that one, it's ready for another round of reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2967343277 From amitkumar at openjdk.org Thu Jun 12 15:58:21 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Jun 2025 15:58:21 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v5] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/f8fbb4df..24511b00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From mli at openjdk.org Thu Jun 12 16:57:46 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 12 Jun 2025 16:57:46 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v7] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - review Zicond - comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/9848cc28..eb38253e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=05-06 Stats: 30 lines in 2 files changed: 0 ins; 20 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From mli at openjdk.org Thu Jun 12 16:57:46 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 12 Jun 2025 16:57:46 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v6] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> <1oKHolQiymAnc5OKC1RcR9fDlL9c_F0zW6gHJ3pQKWI=.6e5f69a6-f474-415d-8398-e5e2952d985e@github.com> Message-ID: On Thu, 12 Jun 2025 06:32:19 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> adjust arguments orders > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1385: > >> 1383: if (is_single) { >> 1384: // jump if cmp1 < cmp2 or either is NaN >> 1385: // not jump (i.e. move src to dst) if cmp1 >= cmp2 > > Or simply: `// fallthrough (i.e. move src to dst) if cmp1 >= cmp2`? Similar for other friends. Yes, fixed. > test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java line 70: > >> 68: // Booltest::ge >> 69: TestFramework framework = new TestFramework(Test_ge_1.class); >> 70: framework.addFlags("-XX:-TieredCompilation", "-XX:+UseZicond", "-Xlog:jit+compilation=trace").start(); > > I see `-XX:+UseZicond` and `-XX:-UseZicond` options are used in this test. What if the testing platform doesn't have the Zicond extension? Maybe we can simply remove use of these options as option `UseZicond` will be auto-enabled for fastdebug builds for test coverage purpose if this extension is available. Make sense, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25696#discussion_r2143223776 PR Review Comment: https://git.openjdk.org/jdk/pull/25696#discussion_r2143223992 From dlong at openjdk.org Thu Jun 12 19:28:33 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Jun 2025 19:28:33 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v3] In-Reply-To: References: Message-ID: <3OZ0EncGPrJQqk-B2a-4bl95a4zUm45dW4RDI2lGkBo=.e897032b-10ad-4480-b8cd-ee543c5d5395@github.com> On Thu, 12 Jun 2025 08:35:54 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: >> >> class A { >> static class B { >> static String field; >> static void test() { >> String tmp = field; >> new C(field); >> } >> } >> >> static class C { >> static { >> B.field = "Hello"; >> } >> >> C(String val) { >> if (val == null) { >> throw new RuntimeException("Should not reach here"); >> } >> } >> } >> } >> >> Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. >> >> # Changes >> >> To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. >> >> # Benchmark Results >> >> Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. >> >> # Testing >> >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605637901) >> - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes >> >> # Acknowledgements >> >> Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix syntax error Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25725#pullrequestreview-2922356811 From cslucas at openjdk.org Thu Jun 12 19:40:13 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 12 Jun 2025 19:40:13 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v5] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Rename 'reasons' enum. Adjust default value for invalidationReason. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/1f3c2598..831ba2f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=03-04 Stats: 121 lines in 22 files changed: 1 ins; 2 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From sparasa at openjdk.org Thu Jun 12 19:46:39 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 12 Jun 2025 19:46:39 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used Message-ID: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 4 bytes. Without this fix, we see the following errors for the C2 compiler tests below: compiler/vectorization/runner/ArrayTypeConvertTest.java compiler/intrinsics/zip/TestFpRegsABI.java # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) # # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 # This PR fixes the errors in the above-mentioned tests. ------------- Commit messages: - 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used Changes: https://git.openjdk.org/jdk/pull/25787/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25787&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359386 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25787.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25787/head:pull/25787 PR: https://git.openjdk.org/jdk/pull/25787 From dnsimon at openjdk.org Thu Jun 12 19:55:31 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 12 Jun 2025 19:55:31 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v5] In-Reply-To: References: Message-ID: <-Xsyt7584x_m5CmbjFaE9zfnoFbYBnbVBpNIqH_QoZM=.81acb204-1e37-4eff-ba26-7f6b993e7883@github.com> On Thu, 12 Jun 2025 19:40:13 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Rename 'reasons' enum. Adjust default value for invalidationReason. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotNmethod.java line 225: > 223: > 224: private static int unknownInvalidationReason() { > 225: return HotSpotJVMCIRuntime.runtime().config.getConstant("nmethod::ChangeReason::Unknown", Integer.class); `unknownInvalidationReason` -> `jvmciInvalidationReason` `"nmethod::ChangeReason::Unknown"` -> `"nmethod::InvalidationReason::JVMCI_INVALIDATE"` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2143540862 From cslucas at openjdk.org Thu Jun 12 20:18:10 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 12 Jun 2025 20:18:10 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix some remaining renames. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/831ba2f0..6f6d129b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=04-05 Stats: 8 lines in 3 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From dnsimon at openjdk.org Thu Jun 12 20:38:29 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 12 Jun 2025 20:38:29 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: <2j19nj-UatNbeMCtCNwGZWijYFZYgEZQ-VbvcAulwBI=.b2edd200-063c-4b58-905d-76f31f3d8e97@github.com> On Thu, 12 Jun 2025 20:18:10 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix some remaining renames. Looks good to me now - thanks for all the changes. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25706#pullrequestreview-2922585106 From cslucas at openjdk.org Thu Jun 12 20:44:29 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 12 Jun 2025 20:44:29 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 20:18:10 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix some remaining renames. Thank you for reviewing and all the suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2968103752 From sviswanathan at openjdk.org Thu Jun 12 21:13:27 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Jun 2025 21:13:27 GMT Subject: RFR: 8359327: Incorrect AVX3Threshold results into code buffer overflows on APX targets In-Reply-To: References: Message-ID: <38CK1bBonUdFBNcevm0ftHRzOCXKOmXJtnDGBlR9fK4=.dab3e741-eec4-41e7-92d6-a7c7d40f2c72@github.com> On Thu, 12 Jun 2025 11:49:41 GMT, Jatin Bhateja wrote: > As per the latest architecture-instruction-set-extensions-programming-reference manual version 57[1] , upcoming Diamond Rapids server with APX feature has a different CPU family ID (19) than prior Xeons (6). > > Recently integrated EEVEX to REX2 demotion support with [JDK-8351994](https://bugs.openjdk.org/browse/JDK-8351994) already handles this through a newly defined _VM_Version::is_intel_server_family()_ API, but the existing AVX3Therehold setting is agnostic to this change, which causes code buffer overflows during arraycopy stubs generation. > > Patch fixes this issue and also appropriately increments final code buffer size to prevent buffer overruns during stub generation with non zero AVX3Thereshold. > > [1] https://www.intel.com/content/www/us/en/content-details/851355/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=intel%20architecture%20instruction%20set%20extensions%20programming%20reference Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25780#pullrequestreview-2922666933 From sviswanathan at openjdk.org Thu Jun 12 21:14:29 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Jun 2025 21:14:29 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: <_jnRV2RRIkqKppZ1Rb1ICo06k5VexDw4cOP75zF0hPw=.fc74e280-9c6d-4013-8a18-e257e9536c88@github.com> On Thu, 12 Jun 2025 19:41:01 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 4 bytes. > > Without this fix, we see the following errors for the C2 compiler tests below: > > compiler/vectorization/runner/ArrayTypeConvertTest.java > compiler/intrinsics/zip/TestFpRegsABI.java > > > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 > # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) > # > # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 > # > > > This PR fixes the errors in the above-mentioned tests. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25787#pullrequestreview-2922669267 From snatarajan at openjdk.org Thu Jun 12 22:47:43 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 12 Jun 2025 22:47:43 GMT Subject: RFR: 8342941: IGV: Add new graph dumps for post loop, empty loop removal, and one iteration removal [v2] In-Reply-To: References: Message-ID: > This changeset adds BEFORE/AFTER graph dumps for creating a post loop (`insert_post_loop()`), removing an empty loop (`do_remove_empty_loop()`), and removing a one iteration loop (`do_one_iteration_loop()`). > > Changes: > - Added `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` for dumping graphs before and after `insert_post_loop()`. > - Added `BEFORE_REMOVE_EMPTY_LOOP` and `AFTER_REMOVE_EMPTY_LOOP` for dumping graphs before and after `do_remove_empty_loop()`. > - Added `BEFORE_ONE_ITERATION_LOOP` and `AFTER_ONE_ITERATION_LOOP` for dumping graphs before and after `do_one_iteration_loop()`. > > Below are sample screenshots (IGV print level 4 ) mainly showing the new phase . > 1. `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` > ![image](https://github.com/user-attachments/assets/1661cede-5d70-4e0d-abec-3d091c7675c8) > 2. `BEFORE_POST_LOOP` and `AFTER_POST_LOOP` with SuperWordLoopUnrollAnalysis enabled > ![image](https://github.com/user-attachments/assets/6a22e6f0-4e6c-4e9d-8b6b-2bf75fac783d) > 3.` BEFORE_REMOVE_EMPTY_LOOP `and `AFTER_REMOVE_EMPTY_LOOP` > ![image](https://github.com/user-attachments/assets/3281f00b-575e-4604-83dd-831037d8dd47) > 4. `BEFORE_ONE_ITERATION_LOOP` and `AFTER_ONE_ITERATION_LOOP` > ![image](https://github.com/user-attachments/assets/efddbc9a-64f7-403d-acfe-330d75a00911) > > Question to reviewers: > Are the new compiler phases OK, or should we change anything? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: Addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25756/files - new: https://git.openjdk.org/jdk/pull/25756/files/767326bf..e4dab565 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25756&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25756&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25756.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25756/head:pull/25756 PR: https://git.openjdk.org/jdk/pull/25756 From qamai at openjdk.org Thu Jun 12 23:33:59 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Jun 2025 23:33:59 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v65] In-Reply-To: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> References: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> Message-ID: On Mon, 9 Jun 2025 17:29:36 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: > > - Merge branch 'master' into unsignedbounds > - add more intn_t tests > - Emanuel's reviews > - alignment wording > - refinement > - refine the cases where there does not exist a result > - add some more sanity static_asserts > - intn_t refinements > - Emanuel's reviews > - Emanuel's reviews > - ... and 73 more: https://git.openjdk.org/jdk/compare/cae1fd33...ad6ac7cb I think all the tests have passed, I will integrate this PR. Thank you a lot for reviewing this PR. Especially, thanks @eme64 for your rigorous reviews and valuable suggestions, it helps a lot in shaping this PR in a better manner. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2968436381 From qamai at openjdk.org Fri Jun 13 01:09:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 13 Jun 2025 01:09:02 GMT Subject: Integrated: 8315066: Add unsigned bounds and known bits to TypeInt/Long In-Reply-To: References: Message-ID: <-MVck_5OGYP3MVBbMmacStOIWpSVVmalg9mOz7AdWaY=.2ddeafb0-b38f-4c59-8902-0f729688c5c7@github.com> On Sat, 20 Jan 2024 19:23:23 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 This pull request has now been integrated. Changeset: 991097b7 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/991097b7bf08cc1a4ceedb0c555b12948ae71885 Stats: 2570 lines in 13 files changed: 2007 ins; 331 del; 232 mod 8315066: Add unsigned bounds and known bits to TypeInt/Long Co-authored-by: Emanuel Peter Reviewed-by: epeter, kvn, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/17508 From syan at openjdk.org Fri Jun 13 02:04:35 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 13 Jun 2025 02:04:35 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:25:55 GMT, Beno?t Maillard wrote: >> This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. >> >> Thanks! >> >> ### Testing >> >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) >> - [ ] tier1-3, plus some internal testing >> - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check >> >> Shout out to @TobiHartmann for helping out with jtreg > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8357816: Add additional run without flags Hi, `produced by the fuzzer` How can I use the fuzzer to produce testcases, is there any manual document test/hotspot/jtreg/compiler/loopopts/LoopReductionHasControlOrBadInput.java line 29: > 27: * @summary Optimization bails out and hits an assert: > 28: * assert(false) failed: reduction has ctrl or bad vector_input > 29: * @run main/othervm -XX:CompileCommand=compileonly,compiler.loopopts.LoopReductionHasControlOrBadInput::* -Xbatch -XX:-TieredCompilation compiler.loopopts.LoopReductionHasControlOrBadInput Should we split this long line to two lines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2968753712 PR Review Comment: https://git.openjdk.org/jdk/pull/25774#discussion_r2144062909 From syan at openjdk.org Fri Jun 13 02:07:28 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 13 Jun 2025 02:07:28 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:25:55 GMT, Beno?t Maillard wrote: >> This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. >> >> Thanks! >> >> ### Testing >> >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) >> - [ ] tier1-3, plus some internal testing >> - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check >> >> Shout out to @TobiHartmann for helping out with jtreg > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8357816: Add additional run without flags The `tier1-3, plus some internal testing` tests seems do not needed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2968757596 From fyang at openjdk.org Fri Jun 13 03:12:28 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 13 Jun 2025 03:12:28 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 06:31:52 GMT, Tobias Hartmann wrote: > Would it be possible to add an IR Framework test for this, checking that the right stub is selected? Hi Tobias, Thanks for taking a look! Here in this case, `StubRoutines::select_arraycopy_function` returns an entry address of the runtime call. And this address is feeded to `GraphKit::make_runtime_call` or `IdealKit::make_leaf_call_no_fp` to make a runtime call node. So with this change the only difference lies in the target address of the runtime call node. I am not sure, but I guess that it is not supported by the IR framework to detect such a difference for now? Please let me know if I missed anything. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2968856485 From epeter at openjdk.org Fri Jun 13 06:05:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Jun 2025 06:05:03 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v65] In-Reply-To: References: <0LL62A7X3DeYMqkv4M7vz-OD79VuVCmNGy0CU_aCbkM=.e5fb80a1-cd23-4ffb-9839-ceaccd8695db@github.com> Message-ID: On Thu, 12 Jun 2025 23:31:06 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: >> >> - Merge branch 'master' into unsignedbounds >> - add more intn_t tests >> - Emanuel's reviews >> - alignment wording >> - refinement >> - refine the cases where there does not exist a result >> - add some more sanity static_asserts >> - intn_t refinements >> - Emanuel's reviews >> - Emanuel's reviews >> - ... and 73 more: https://git.openjdk.org/jdk/compare/cae1fd33...ad6ac7cb > > I think all the tests have passed, I will integrate this PR. > > Thank you a lot for reviewing this PR. Especially, thanks @eme64 for your rigorous reviews and valuable suggestions, it helps a lot in shaping this PR in a better manner. @merykitty Thanks for the attribution ? Congrats on shipping it ? Looking forward to the improvements that come from this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2969161890 From epeter at openjdk.org Fri Jun 13 06:12:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Jun 2025 06:12:29 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 02:04:33 GMT, SendaoYan wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8357816: Add additional run without flags > > The `tier1-3, plus some internal testing` tests seems do not needed? @sendaoYan We use multiple fuzzers, including `javafuzzer`, just google it. We run it with all sorts of extra VM flag combinations as well. When it fails, you have a reproducer that is hundreds of lines long. You can reduce the reproducer by hand, or using a tool such as `creduce`, then clean it up by hand. That's how we arrive at such nice small reproducers. @sendaoYan It is better to run more tests. Higher tiers run with extra flags, and that sometimes reveals other failure modes. So it is better to run more and be sure than break the CI ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2969176794 PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2969179493 From syan at openjdk.org Fri Jun 13 06:32:28 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 13 Jun 2025 06:32:28 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 02:04:33 GMT, SendaoYan wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8357816: Add additional run without flags > > The `tier1-3, plus some internal testing` tests seems do not needed? > @sendaoYan It is better to run more tests. Higher tiers run with extra flags, and that sometimes reveals other failure modes. So it is better to run more and be sure than break the CI ;) I mean this PR only add a new test, and this test run with othervm mode, it won't break other tests in theory? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2969221045 From epeter at openjdk.org Fri Jun 13 06:42:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Jun 2025 06:42:37 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 06:29:37 GMT, SendaoYan wrote: > > @sendaoYan It is better to run more tests. Higher tiers run with extra flags, and that sometimes reveals other failure modes. So it is better to run more and be sure than break the CI ;) > > I mean this PR only add a new test, and this test run with othervm mode, it won't break other tests in theory? Sure. But it's not nice if this new test fails with higher tiers and additional flags either, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2969241316 From chagedorn at openjdk.org Fri Jun 13 06:46:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 13 Jun 2025 06:46:32 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v5] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 14:29:13 GMT, Emanuel Peter wrote: >> I would like to add primitive type support to the template framework library. >> >> In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. >> >> I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. >> >> Original experiments from here: https://github.com/openjdk/jdk/pull/23418 > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into JDK-8358772-TemplateFramework-types-operations-expressions > - rm static final > - review suggestions applied > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > Co-authored-by: Manuel H?ssig > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > Co-authored-by: Manuel H?ssig > - parser refactor > - rm previous changes > - rm unnecessary file > - documentation > - improve test > - ... and 6 more: https://git.openjdk.org/jdk/compare/dd688290...b07f752b Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25672#pullrequestreview-2923586113 From fyang at openjdk.org Fri Jun 13 06:55:31 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 13 Jun 2025 06:55:31 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v7] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Thu, 12 Jun 2025 16:57:46 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - review Zicond > - comments Hi, I changed your test a bit and I see test failure on my linux-riscv64 platform (no Zicond). diff --git a/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java b/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java index 472c38e009a..77398e1d88b 100644 --- a/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java +++ b/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java @@ -98,11 +98,11 @@ public static int test_float_BoolTest_ge(float x, float y) { // when neither is NaN, and x > y // return 0 // when neither is NaN, and x <= y - return !(x <= y) ? 1 : 0; + return !(x <= y) ? 10 : 20; } @DontCompile public static int golden_float_BoolTest_ge(float x, float y) { - return !(x <= y) ? 1 : 0; + return !(x <= y) ? 10 : 20; } @Test @@ -113,11 +113,11 @@ public static int test_double_BoolTest_ge(double x, double y) { // when neither is NaN, and x > y // return 0 // when neither is NaN, and x <= y - return !(x <= y) ? 1 : 0; + return !(x <= y) ? 10 : 20; } @DontCompile public static int golden_double_BoolTest_ge(double x, double y) { - return !(x <= y) ? 1 : 0; + return !(x <= y) ? 10 : 20; } @Run(test = {"test_float_BoolTest_ge", "test_double_BoolTest_ge"}) $ make test TEST="test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java" JTREG="TIMEOUT_FACTOR=8" STDERR: java.lang.RuntimeException: Not trigger BoolTest::ge: expected true, was false at jdk.test.lib.Asserts.fail(Asserts.java:715) at jdk.test.lib.Asserts.assertTrue(Asserts.java:545) at compiler.c2.irTests.TestFPComparison2.main(TestFPComparison2.java:71) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1474) JavaTest Message: Test threw exception: java.lang.RuntimeException JavaTest Message: shutting down test ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2969282088 From epeter at openjdk.org Fri Jun 13 06:58:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Jun 2025 06:58:38 GMT Subject: RFR: 8358772: Template-Framework Library: Primitive Types [v5] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 06:43:34 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8358772-TemplateFramework-types-operations-expressions >> - rm static final >> - review suggestions applied >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> Co-authored-by: Manuel H?ssig >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> Co-authored-by: Manuel H?ssig >> - parser refactor >> - rm previous changes >> - rm unnecessary file >> - documentation >> - improve test >> - ... and 6 more: https://git.openjdk.org/jdk/compare/dd688290...b07f752b > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @mhaessig Thanks for the swift reviews ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25672#issuecomment-2969288746 From epeter at openjdk.org Fri Jun 13 06:58:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Jun 2025 06:58:40 GMT Subject: Integrated: 8358772: Template-Framework Library: Primitive Types In-Reply-To: References: Message-ID: On Fri, 6 Jun 2025 13:25:48 GMT, Emanuel Peter wrote: > I would like to add primitive type support to the template framework library. > > In follow-up work, we will use these types in random expression generation - but they can also already be useful on their own now. > > I encountered an issue with some methods that return `Token` from the `TemplateFramework`, such as `Hook.insert` and `addDataName`. Since `Token` was package private, this class could not be used in some places where automatic type inference is required. I now refactored the code, so that the `Token` is just an empty interface, and all the methods are moved to a separate class `TokenParser`. > > Original experiments from here: https://github.com/openjdk/jdk/pull/23418 This pull request has now been integrated. Changeset: 6749c62b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6749c62b9e4261d25bea477e3c0840ab0ee9c73e Stats: 624 lines in 7 files changed: 580 ins; 39 del; 5 mod 8358772: Template-Framework Library: Primitive Types Reviewed-by: mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25672 From roland at openjdk.org Fri Jun 13 07:22:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 13 Jun 2025 07:22:29 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v3] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 14:23:17 GMT, Roberto Casta?eda Lozano wrote: > Another question: could `PhaseIdealLoop::try_move_store_before_loop` cause similar issues on strip-mined loops? That one moves stores out of the inner and outer loop so, no, I don't see a similar issue there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2969359888 From rcastanedalo at openjdk.org Fri Jun 13 07:36:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 07:36:31 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: <99Ghg2WmHcRGl8Rvty9tJqNX1mdrahuu6dbbh-AclBE=.93a6be90-eeae-40d2-9ce2-760d52695b4e@github.com> On Wed, 11 Jun 2025 11:30:32 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. >> >> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. > >> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. > > Test results (tier1-5 in Oracle's internal test system) look good. > Thanks for the reviews @robcasloz @eme64 Change is read for another pass. Thanks Roland, I'll re-run testing and come back with results on Monday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2969392465 From rcastanedalo at openjdk.org Fri Jun 13 08:04:28 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 08:04:28 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Thu, 12 Jun 2025 15:11:05 GMT, Roland Westrelin wrote: > > Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. > > Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. > > I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`. > > It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit. Thanks for the background, Roland! I think it would be worth exploring this, but I agree that there is a risk of silently affecting other loop optimizations. Luckily, the IR test framework gives us now a means to improve our confidence that changes in this area do not affect expected optimizations. Unfortunately, our current IR test coverage of loop optimizations is incomplete, so a pre-condition to exploring full SSA for strip-mined loops (and something worth doing in any case IMO) would be adding more IR tests checking that at least basic optimizations like peeling, unswitching, unrolling, range check elimination, etc. happen as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2969458368 From hgreule at openjdk.org Fri Jun 13 08:05:28 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 13 Jun 2025 08:05:28 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v5] In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: <7luvvsRIFw3U3rgZkbbmuL-o7gvsfkAbgGmPA2fy_5M=.0ae26d78-f811-4ea6-8cfc-10d3b1d513ab@github.com> > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Address more comments - Merge branch 'master' into improve-mod-value - Add randomized test - Use BasicType for shared implementation - Update ModL comment - Use TOP instead of ZERO - Apply suggested test changes - adapt uabs -> g_uabs name change - change range of mod by 0 for PhaseCCP - Improve ModLNode::Value - ... and 3 more: https://git.openjdk.org/jdk/compare/93328951...77134c1a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25254/files - new: https://git.openjdk.org/jdk/pull/25254/files/80914319..77134c1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=03-04 Stats: 160381 lines in 2395 files changed: 98466 ins; 43790 del; 18125 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From hgreule at openjdk.org Fri Jun 13 08:20:31 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 13 Jun 2025 08:20:31 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: <0JeL-4EG6Gkc-DnYfU9GgM7RnfLL9CKUbs1ehaXp0mw=.1e66d845-5ec2-4252-86c0-ab5bd5e28adf@github.com> On Wed, 28 May 2025 09:53:32 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update ModL comment >> - Use TOP instead of ZERO >> - Apply suggested test changes > > @SirYwell Thanks for looking into this, that looks promising! > > I have two bigger comments: > - Could we unify the L and I code, either using C++ templating or `BasicType`? It would reduce code duplication. > - Can we have some tests where the input ranges are random as well, and where we check the output ranges with some comparisons? > > ------------------ > Copied from the code comment: > >> Nice work with the examples you already have, and randomizing some of it! >> >> I would like to see one more generalized test. >> - compute `res = lhs % rhs` >> - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. >> - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. >> >> Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. >> >> I hope that makes sense :) >> This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. >> >> This is an example, where I asked someone to try this out as well: >> https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 @eme64 I merged master and hopefully addressed your latest comments. Now that we have #17508 integrated, I could also directly update the unsigned variant, but I'm also fine with doing that separately. WDYT? I also checked the constant folding part again (or generally whenever the RHS is a constant), these code paths are indeed not used by PhaseGVN directly (but by PhaseCCP and PhaseIdealLoop). That makes it a bit difficult to test that part properly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-2969498833 From thartmann at openjdk.org Fri Jun 13 08:29:26 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 13 Jun 2025 08:29:26 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 01:54:34 GMT, Fei Yang wrote: > Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. > > There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. > > And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. > > Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. > > Testing: > - [x] Tier1-3 on linux-aarch64 (release & fastdebug) > - [x] Tier1-3 on linux-riscv64 (release) > - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 But `StubRoutines::select_arraycopy_function` also sets `copyfunc_name` which is printed for the `CallNode` and should therefore be matchable by the IR framework, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2969522865 From duke at openjdk.org Fri Jun 13 08:36:14 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 13 Jun 2025 08:36:14 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v3] In-Reply-To: References: Message-ID: > This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. > > Thanks! > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) > - [x] tier1-3, plus some internal testing > - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check > > Shout out to @TobiHartmann for helping out with jtreg Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8357816: Split long line into several ones ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25774/files - new: https://git.openjdk.org/jdk/pull/25774/files/d946fcce..535bae67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25774&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25774&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25774/head:pull/25774 PR: https://git.openjdk.org/jdk/pull/25774 From duke at openjdk.org Fri Jun 13 08:36:15 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 13 Jun 2025 08:36:15 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: <2nGS1WLhLZkB86Pb3u2ORue6e0Vhk35atzNnADomxhA=.6b42efff-cd5d-4498-8d74-ccca63a07bc2@github.com> On Fri, 13 Jun 2025 02:00:30 GMT, SendaoYan wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8357816: Add additional run without flags > > test/hotspot/jtreg/compiler/loopopts/LoopReductionHasControlOrBadInput.java line 29: > >> 27: * @summary Optimization bails out and hits an assert: >> 28: * assert(false) failed: reduction has ctrl or bad vector_input >> 29: * @run main/othervm -XX:CompileCommand=compileonly,compiler.loopopts.LoopReductionHasControlOrBadInput::* -Xbatch -XX:-TieredCompilation compiler.loopopts.LoopReductionHasControlOrBadInput > > Should we split this long line to two lines. Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25774#discussion_r2144493563 From duke at openjdk.org Fri Jun 13 08:37:48 2025 From: duke at openjdk.org (erifan) Date: Fri, 13 Jun 2025 08:37:48 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases Message-ID: If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. Some JTReg test cases are added to ensure the optimization is effective. I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. [1] https://github.com/openjdk/jdk/pull/24674 ------------- Commit messages: - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases Changes: https://git.openjdk.org/jdk/pull/25793/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356760 Stats: 178 lines in 3 files changed: 174 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25793/head:pull/25793 PR: https://git.openjdk.org/jdk/pull/25793 From syan at openjdk.org Fri Jun 13 08:43:31 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 13 Jun 2025 08:43:31 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v3] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 08:36:14 GMT, Beno?t Maillard wrote: >> This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. >> >> Thanks! >> >> ### Testing >> >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) >> - [x] tier1-3, plus some internal testing >> - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check >> >> Shout out to @TobiHartmann for helping out with jtreg > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8357816: Split long line into several ones Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25774#pullrequestreview-2923877793 From syan at openjdk.org Fri Jun 13 08:43:31 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 13 Jun 2025 08:43:31 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v2] In-Reply-To: References: Message-ID: <6jplGiXL8s2sLSQ5IaNc4YEnhAfJKqano9evO-Ll5hk=.6c6b7e42-c1fd-4243-91d4-8751fd86713d@github.com> On Fri, 13 Jun 2025 06:39:22 GMT, Emanuel Peter wrote: >>> @sendaoYan It is better to run more tests. Higher tiers run with extra flags, and that sometimes reveals other failure modes. So it is better to run more and be sure than break the CI ;) >> >> I mean this PR only add a new test, and this test run with othervm mode, it won't break other tests in theory? > >> > @sendaoYan It is better to run more tests. Higher tiers run with extra flags, and that sometimes reveals other failure modes. So it is better to run more and be sure than break the CI ;) >> >> I mean this PR only add a new test, and this test run with othervm mode, it won't break other tests in theory? > > Sure. But it's not nice if this new test fails with higher tiers and additional flags either, right? @eme64 Thanks your detailed explanations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2969558961 From rcastanedalo at openjdk.org Fri Jun 13 08:57:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 08:57:34 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> Message-ID: On Thu, 12 Jun 2025 15:39:35 GMT, Roland Westrelin wrote: > One way would be to simply assert that there's no `NarrowMemProj`s left during final graph reshape. Is that what you'd like? Yes, that would be great (and I think it is OK to leave it to a future RFE if fully enforcing it would further increase the complexity of this PR). > Stepping back, what's the concern here? The new projections should mostly be harmless. I do not have any concrete concern in mind, I just think it would be good to limit the number of C2 components that might be affected by this change, in this case the back-end. One can never be totally confident that there is no fragile analysis relying on hidden assumptions about the shape of the memory graph (in this case, single memory projections). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2969607350 From jbhateja at openjdk.org Fri Jun 13 09:03:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Jun 2025 09:03:35 GMT Subject: RFR: 8359327: Incorrect AVX3Threshold results into code buffer overflows on APX targets In-Reply-To: References: Message-ID: <8glQ1MywpARTxkIL_gJ3WwxU9Lbdo21JQIvWmuB14wU=.bfe3520f-9ce4-4f9d-a6a2-976fadf2604c@github.com> On Thu, 12 Jun 2025 11:49:41 GMT, Jatin Bhateja wrote: > As per the latest architecture-instruction-set-extensions-programming-reference manual version 57[1] , upcoming Diamond Rapids server with APX feature has a different CPU family ID (19) than prior Xeons (6). > > Recently integrated EEVEX to REX2 demotion support with [JDK-8351994](https://bugs.openjdk.org/browse/JDK-8351994) already handles this through a newly defined _VM_Version::is_intel_server_family()_ API, but the existing AVX3Therehold setting is agnostic to this change, which causes code buffer overflows during arraycopy stubs generation. > > Patch fixes this issue and also appropriately increments final code buffer size to prevent buffer overruns during stub generation with non zero AVX3Thereshold. > > [1] https://www.intel.com/content/www/us/en/content-details/851355/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=intel%20architecture%20instruction%20set%20extensions%20programming%20reference -Xlog:stubs output before and after this patch. Array copy stubs use a 256-bit vector for non-zero AVX3Threshold, which results in a bulkier stub sequence and shows up in the final stub size. Before Patch:- [6.165s][info][stubs] StubRoutines (finalstubs) [0x00007fc74aac46e0, 0x00007fc74aad0f58] used: 31850, free: 19470 After Patch:- [7.273s][info][stubs] StubRoutines (finalstubs) [0x00007fbf3308eee0, 0x00007fbf3309bf28] used: 26186, free: 27134 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25780#issuecomment-2969626248 From jbhateja at openjdk.org Fri Jun 13 09:03:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Jun 2025 09:03:35 GMT Subject: Integrated: 8359327: Incorrect AVX3Threshold results into code buffer overflows on APX targets In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 11:49:41 GMT, Jatin Bhateja wrote: > As per the latest architecture-instruction-set-extensions-programming-reference manual version 57[1] , upcoming Diamond Rapids server with APX feature has a different CPU family ID (19) than prior Xeons (6). > > Recently integrated EEVEX to REX2 demotion support with [JDK-8351994](https://bugs.openjdk.org/browse/JDK-8351994) already handles this through a newly defined _VM_Version::is_intel_server_family()_ API, but the existing AVX3Therehold setting is agnostic to this change, which causes code buffer overflows during arraycopy stubs generation. > > Patch fixes this issue and also appropriately increments final code buffer size to prevent buffer overruns during stub generation with non zero AVX3Thereshold. > > [1] https://www.intel.com/content/www/us/en/content-details/851355/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=intel%20architecture%20instruction%20set%20extensions%20programming%20reference This pull request has now been integrated. Changeset: e7f63ba3 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/e7f63ba3109adf614cee1bc392cfeef85e9ca778 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8359327: Incorrect AVX3Threshold results into code buffer overflows on APX targets Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25780 From mhaessig at openjdk.org Fri Jun 13 09:04:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 13 Jun 2025 09:04:32 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v3] In-Reply-To: References: Message-ID: <34mEOeGoA7CjNhfTEb06sEX0zEL3OHb9pMlGKVcUKfY=.fca1b20c-5d1f-4a2f-85d3-bef07873482c@github.com> On Thu, 12 Jun 2025 08:35:54 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: >> >> class A { >> static class B { >> static String field; >> static void test() { >> String tmp = field; >> new C(field); >> } >> } >> >> static class C { >> static { >> B.field = "Hello"; >> } >> >> C(String val) { >> if (val == null) { >> throw new RuntimeException("Should not reach here"); >> } >> } >> } >> } >> >> Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. >> >> # Changes >> >> To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. >> >> # Benchmark Results >> >> Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. >> >> # Testing >> >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605637901) >> - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes >> >> # Acknowledgements >> >> Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix syntax error Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25725#issuecomment-2969627485 From duke at openjdk.org Fri Jun 13 09:04:32 2025 From: duke at openjdk.org (duke) Date: Fri, 13 Jun 2025 09:04:32 GMT Subject: RFR: 8357782: JVM JIT Causes Static Initialization Order Issue [v3] In-Reply-To: References: Message-ID: <_cmFAunVD6diZpigumLWFUIv8JF5ja_X83W4w-0kBqs=.53d9e5d6-8db0-4738-8f63-08a3552c29b4@github.com> On Thu, 12 Jun 2025 08:35:54 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: >> >> class A { >> static class B { >> static String field; >> static void test() { >> String tmp = field; >> new C(field); >> } >> } >> >> static class C { >> static { >> B.field = "Hello"; >> } >> >> C(String val) { >> if (val == null) { >> throw new RuntimeException("Should not reach here"); >> } >> } >> } >> } >> >> Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. >> >> # Changes >> >> To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. >> >> # Benchmark Results >> >> Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. >> >> # Testing >> >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605637901) >> - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes >> >> # Acknowledgements >> >> Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix syntax error @mhaessig Your change (at version 71b04b50283385c29e892fb58ac20b70c5125cb5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25725#issuecomment-2969630647 From duke at openjdk.org Fri Jun 13 09:06:31 2025 From: duke at openjdk.org (duke) Date: Fri, 13 Jun 2025 09:06:31 GMT Subject: RFR: 8354196: C2: reorder and capitalize phase definition [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 12:20:15 GMT, Manuel H?ssig wrote: >> This PR performs some cleanup and formatting around the phase definitions in C2: >> - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), >> - the phases are reordered to be more or less in the order of execution or occurrence in the code, >> - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, >> - `CompilePhase.java` is aligned for better readability. >> >> This change was tested on: >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) >> - [x] tier1 plus some Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Split-If is a proper name @mhaessig Your change (at version 9f3676e4a1fbb30a934e8507a86eafee13419131) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25778#issuecomment-2969636990 From rcastanedalo at openjdk.org Fri Jun 13 09:36:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 09:36:36 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 14:21:35 GMT, Manuel H?ssig wrote: >> src/hotspot/cpu/x86/peephole_x86_64.cpp line 325: >> >>> 323: // Ensure the MachProj is in the same block as the decode and the lea. >>> 324: if (proj == nullptr || !block->contains(proj)) { >>> 325: return false; >> >> The only scenario in which I think this may be possible is when we stress scheduling, otherwise `RFLAGS` projections should always be scheduled immediately after their input. Consider adding an assertion like `assert(StressGCM, "should be scheduled contiguously otherwise");` or similar here. > > Interesting. I thought that this case would be unlikely, but did not know this. I added the assert in 0f464e1 At least that would be my expectation. If we get disproved in the future, we can see it as an opportunity to improve quality of the generated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2144609747 From rcastanedalo at openjdk.org Fri Jun 13 09:40:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 09:40:34 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: References: Message-ID: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> On Thu, 12 Jun 2025 15:08:54 GMT, Manuel H?ssig wrote: > > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? > > It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. OK, that would be great! If you do not find one, I think the PR is still OK because it is easy to see how the peephole would handle the scenario. But it would be of course better to confirm it with an actual test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2969748820 From shade at openjdk.org Fri Jun 13 09:46:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Jun 2025 09:46:30 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Thu, 12 Jun 2025 19:41:01 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 4 bytes. > > Without this fix, we see the following error for the C2 compiler tests below: > > compiler/vectorization/runner/ArrayTypeConvertTest.java > compiler/intrinsics/zip/TestFpRegsABI.java > > > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 > # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) > # > # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 > # > > > This PR fixes the errors in the above-mentioned tests. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4691: > 4689: > 4690: // Using the APX extended general purpose registers increases the instruction encoding size by 4 bytes. > 4691: int max_size = dst->encoding() <= 15 ? 23 : 27; Do you want to write it in a more explicit way then? int max_size = 23 + (UseAPX ? 4 : 0); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25787#discussion_r2144636061 From rcastanedalo at openjdk.org Fri Jun 13 09:48:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 09:48:33 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 14:46:55 GMT, Manuel H?ssig wrote: > Matching leaP* Nodes is the next best option. The main reason I went for such specific IR conditions is to show that the peephole works for all leaP* variants. If that is not valuable enough, then this would get rid of all the heap size related cruft. Yes, I think it is worth doing this simplification. I think it is easy enough to see that, if the peephole works for one of the variants, it will work for the others. > The most finicky tests are RegexFind and StringInflate. These could be outright removed, because they mainly show interesting behavior if all VM intrinsics are disabled. I was also not able to extract these tests into standalone tests, despite some effort. OK, if extracting standalone tests is hard, then I would vote for going with what the PR has (except for the `leaP*` simplification). We can always simplify the test cases in the future if they prove to be difficult to maintain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2144644956 From mhaessig at openjdk.org Fri Jun 13 09:54:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 13 Jun 2025 09:54:11 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: References: Message-ID: > # Issue Summary > > Running > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. > > It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) > - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms > - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Fix inadvertantly removed header Co-developed-by: Damon Fenacci - Respect NonNMethodCodeHeapSize during ergonomic compiler count selection - Correctly calculate the compiler buffer size Co-authored-by: Aleksey Shipilev ------------- Changes: https://git.openjdk.org/jdk/pull/25770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25770&range=02 Stats: 11 lines in 1 file changed: 8 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25770/head:pull/25770 PR: https://git.openjdk.org/jdk/pull/25770 From mhaessig at openjdk.org Fri Jun 13 09:54:12 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 13 Jun 2025 09:54:12 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 09:35:19 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> Running >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. >> >> It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. >> >> # Testing >> >> - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) >> - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms >> - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) > > Manuel H?ssig has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. I made this dependent on #25791 and rebased onto it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25770#issuecomment-2969785767 From mli at openjdk.org Fri Jun 13 10:03:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 10:03:48 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v7] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: <_9oszW6IHC-yYRpfL_VXotGt6Ufo9-wGPal4OGrzXak=.a95e0449-4d61-41cc-8fc0-dce0feb0061b@github.com> On Fri, 13 Jun 2025 06:52:35 GMT, Fei Yang wrote: > Hi, I changed your test a bit and I see test failure on my linux-riscv64 platform (no Zicond). This failure means it does not go to `BoolTest::ge` path, but the calculation is still correct. > $ make test TEST="test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java" JTREG="TIMEOUT_FACTOR=8" Did you have other vm options when trigger the tests? I just added more tests, can you try it in your environment? It passed in my local env. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2969804697 From mli at openjdk.org Fri Jun 13 10:03:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 10:03:48 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v8] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/eb38253e..962f4944 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=06-07 Stats: 690 lines in 1 file changed: 648 ins; 0 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From jbhateja at openjdk.org Fri Jun 13 10:16:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Jun 2025 10:16:17 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v9] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/f0f5998e..dd3262aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=07-08 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Fri Jun 13 10:21:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Jun 2025 10:21:16 GMT Subject: RFR: 8351645: C2: Assertion failures in Expand/CompressBits idealizations with TOP In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 09:32:46 GMT, Jatin Bhateja wrote: >> Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Problem occues during IGVN cleanups after partial peeling of loop. >> >> Test mentioned in the bug report has been included along with the patch. >> >> Kindly review. >> >> Best Regards, >> Jatin > > Root Cause: Problem occurs during IGVN cleanup after partial peeling. > > Partial peeling rotates the loop by bringing out the peel section and creating a new loop which begins with the non-peel section, followed by the peel section loop back. > > To perform this translation, the compiler begins by cloning the original loop, brings the peel section into the new loop header, re-wires the new header to point to the start of the non-peel block (cut-point) of new loop and then stitches peel section of the cloned loop after non-peel section thereby rotating the original loop. Since the peel-section is the only usable part of the cloned loop hence all remaining part of the loop is swiped out by GVN cleanup. > > In this case, during cleanups when the control reaches the ExpandBits/CompressBits idealization, it hits upon an unsafe use (is_* call) of mask input, which was tied to TOP node and results into an assertion failure, to fix the problem this PR adds safe isa_* call before unsafe accesses. > > With default options, the problem only occurs with Long Expand/CompressBits because for integer variants, nodes get picked up in a different order from the IGVN worklist; we can use -XX:+StressIGVN to reproduce the issue with integral variants. > @jatin-bhateja Thanks for adding me as a contributor! I'll run some testing now. > > Could you change the title to also include `compress`? Suggestion: `C2: handle TOP in Expand/CompressBitsNode::Ideal` Hi @eme64 , let me know if its good to land. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25586#issuecomment-2969860982 From mli at openjdk.org Fri Jun 13 10:26:40 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 10:26:40 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v9] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: more log ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/962f4944..d94c6317 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=07-08 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From mli at openjdk.org Fri Jun 13 10:35:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 10:35:58 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v9] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Fri, 13 Jun 2025 10:26:40 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > more log Maybe we should not assert the log is actually outputted, but just print a message for reference. As what we really want is the correctness of the CMoveI + CmpF/D, but not the specific path it must go through. I'll change assert to a log message in test, to avoid possible false alarms in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2969893453 From mli at openjdk.org Fri Jun 13 10:57:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 10:57:56 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v10] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: no assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/d94c6317..0f5ca7de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=08-09 Stats: 13 lines in 1 file changed: 4 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From mhaessig at openjdk.org Fri Jun 13 11:19:47 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 13 Jun 2025 11:19:47 GMT Subject: Integrated: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 13:33:13 GMT, Manuel H?ssig wrote: > # Issue Summary > > When C1 compiles a method that allocates a new instance of a class that is not fully initialized at compile time, it does not take into account that the `` might run a static initializer that might have side effects. Consider the following example: > > class A { > static class B { > static String field; > static void test() { > String tmp = field; > new C(field); > } > } > > static class C { > static { > B.field = "Hello"; > } > > C(String val) { > if (val == null) { > throw new RuntimeException("Should not reach here"); > } > } > } > } > > Here, `B.field` gets assigned in `C`'s static initializer. Since C1 believes that the `newinstance` does not have memory side effects, local value numbering eliminates the field access for the argument in `C.` because it believes that `B.field` is still the same as `tmp`. Hence, the assignment in `C.` gets effectively ignored and the code triggers the runtime exception. Because this only happens if `C` is not fully initialized when it is compiled, we need `-Xcomp` to reproduce this issue. > > # Changes > > To fix the illustrated issue, this PR ensures that `newinstance` kills the memory state in C1's LVN if the class might not be fully initialized. Since we can not reliably detect if a class has a static initializer, we kill memory whenever a class is not yet loaded or, if it has already been loaded, when it has not been fully initialized, which is conservative and might kill memory when it is not necessary for correctness and have an impact on performance in the form of some additional field accesses. > > # Benchmark Results > > Since this might have an effect on startup, I ran some benchmarks. The results mostly did not show effects outside the run-to-run variance. > > # Testing > > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605637901) > - [x] tier1 through tier3 plus Oracle internal testing on Oracle supported platforms and OSes > > # Acknowledgements > > Shout out to @TobiHartmann who wrote the reproducer that became the regression test and helped me find my way around C1 and narrow down the problem. This pull request has now been integrated. Changeset: e8ef93ae Author: Manuel H?ssig Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/e8ef93ae9de624f25166bdf010c915672b2c5cf4 Stats: 84 lines in 4 files changed: 82 ins; 0 del; 2 mod 8357782: JVM JIT Causes Static Initialization Order Issue Co-authored-by: Tobias Hartmann Reviewed-by: thartmann, dlong, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/25725 From thartmann at openjdk.org Fri Jun 13 11:27:23 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 13 Jun 2025 11:27:23 GMT Subject: [jdk25] RFR: 8357782: JVM JIT Causes Static Initialization Order Issue Message-ID: Hi all, This pull request contains a backport of commit [e8ef93ae](https://github.com/openjdk/jdk/commit/e8ef93ae9de624f25166bdf010c915672b2c5cf4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Manuel H?ssig on 13 Jun 2025 and was reviewed by Tobias Hartmann, Dean Long and Damon Fenacci. Thanks! ------------- Commit messages: - Backport e8ef93ae9de624f25166bdf010c915672b2c5cf4 Changes: https://git.openjdk.org/jdk/pull/25798/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25798&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357782 Stats: 84 lines in 4 files changed: 82 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25798.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25798/head:pull/25798 PR: https://git.openjdk.org/jdk/pull/25798 From jbhateja at openjdk.org Fri Jun 13 11:29:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Jun 2025 11:29:18 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v10] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding descriptive comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/dd3262aa..c8ced549 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=08-09 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Fri Jun 13 11:29:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Jun 2025 11:29:18 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v5] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <_IdYz769mq7-kTO802umUJX7Bmaz3Ds4GWLb75lAW8I=.0394a525-2288-407e-9201-7fb6b5f92353@github.com> Message-ID: On Wed, 11 Jun 2025 14:15:00 GMT, Emanuel Peter wrote: >> Suggestion: >> Suggestion: >> >> // 1. conF must lie within Float16 value range, otherwise we would have rounding issues: >> // Doing the operation in float32 and then rounding is not the same as >> // rounding first and doing the operation in float16. > > Do you have tests where the constant is in float16? Which one? We have a test point "testExactFP16ConstantPatterns" for in-range constants https://github.com/openjdk/jdk/pull/24179/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R405 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2144880546 From shade at openjdk.org Fri Jun 13 11:34:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Jun 2025 11:34:33 GMT Subject: [jdk25] RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 11:20:44 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [e8ef93ae](https://github.com/openjdk/jdk/commit/e8ef93ae9de624f25166bdf010c915672b2c5cf4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Manuel H?ssig on 13 Jun 2025 and was reviewed by Tobias Hartmann, Dean Long and Damon Fenacci. > > Thanks! Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25798#pullrequestreview-2924496753 From rcastanedalo at openjdk.org Fri Jun 13 12:25:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 12:25:37 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 14:46:25 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with four additional commits since the last revision: > > - Factor out address nodes for simplification > - Add assert to codepath only reachable with stressing. > - Rename for clarity > > Confused myself.... > - Revert change to unrelated lines > > This reverts commit d1c6a653770bfe578b1982ac726b258fa08d57b8. Thanks for addressing my comments, Manuel! The changeset looks good overall, I just have a couple of refactoring suggestions besides what we discussed about the tests. src/hotspot/cpu/x86/peephole_x86_64.cpp line 289: > 287: Node* decode_address = decode->in(1); > 288: > 289: bool is_spill = lea_address != decode->in(1) && Suggestion: bool is_spill = lea_address != decode_address && src/hotspot/cpu/x86/peephole_x86_64.cpp line 352: > 350: } > 351: > 352: // Remove spill for the decode if it does not have any other uses. For unambiguity: Suggestion: // Remove spill for the decode if the spill node does not have any other uses. ------------- PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2924126746 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2144653671 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2144943134 From rcastanedalo at openjdk.org Fri Jun 13 12:25:38 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 12:25:38 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 14:23:43 GMT, Manuel H?ssig wrote: >> src/hotspot/cpu/x86/peephole_x86_64.cpp line 288: >> >>> 286: bool is_spill = lea_derived_oop->in(AddPNode::Address) != decode->in(1) && >>> 287: lea_derived_oop->in(AddPNode::Address)->is_SpillCopy() && >>> 288: decode->in(1)->is_SpillCopy(); >> >> The logic around `is_spill` could be simplified by declaring pointers to the lea and decode address inputs and setting these either to their direct inputs or one level up in case of spilling. Something like this: >> >> >> Node* lea_address = lea_derived_oop->in(AddPNode::Address); >> Node* decode_address = decode->in(1); >> >> bool is_spill = lea_address != decode_address && >> lea_address->is_SpillCopy() && >> decode_address->is_SpillCopy(); >> >> if (is_spill) { >> decode_address = decode_address->in(1); >> lea_address = lea_address->in(1); >> } >> >> // The leaP* and the decode must have the same parent. If we have a spill, they must have >> // the same grandparent. >> if (lea_address != decode_address) { >> return false; >> } >> >> (...) >> >> >> >> This creates more opportunities for simplification below, e.g. rewiring the base address of the affected leas directly to `lea_address` (or `decode_address`). > > That makes a lot of things much clearer. I like it! Thank you for the suggestion! Incorporated in 3d6f897 Thanks, now that you have defined `lea_address` and `decode_address`, the logic that distinguishes between the spill/no spill cases can be further simplified by letting these pointers step over the spill node in the case of spilling: if (is_spill) { decode_address = decode_address->in(1); lea_address = lea_address->in(1); } Then you can use them directly later on instead of conditionally using e.g. `decode_address` or `decode_address->in(1)` depending on the value of `is_spill`. Here a sketch of my suggested refactoring (not tested): https://github.com/openjdk/jdk/commit/2b75e85dbeedd380ab3ea02c64816c931b3dfe33. Feel free to apply it (or parts of it) if you agree that it makes the code more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2144964718 From rcastanedalo at openjdk.org Fri Jun 13 12:45:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Jun 2025 12:45:31 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v3] In-Reply-To: <6BFidU0d9xMpaKNH-AwBACbp4qZRTAVIf8IxE1uMPQE=.d619bf69-4ce2-4c01-a15a-5226e43d0970@github.com> References: <6BFidU0d9xMpaKNH-AwBACbp4qZRTAVIf8IxE1uMPQE=.d619bf69-4ce2-4c01-a15a-5226e43d0970@github.com> Message-ID: On Thu, 12 Jun 2025 15:38:53 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - reviews > - Merge branch 'master' into JDK-8356708 > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - test & fix Testing went faster than I thought and did not reveal any issue, looks good, thank you for fixing this! test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java line 81: > 79: } > 80: > 81: // Couple stores sunk in outer loop, no store in inner loop Suggestion: // Multiple stores sunk in outer loop, no store in inner loop test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java line 113: > 111: } > 112: > 113: // Couples stores sunk in outer loop, store in inner loop Suggestion: // Multiple stores sunk in outer loop, store in inner loop ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2924671681 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2145011733 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2145012350 From mhaessig at openjdk.org Fri Jun 13 13:05:38 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 13 Jun 2025 13:05:38 GMT Subject: Integrated: 8354196: C2: reorder and capitalize phase definition In-Reply-To: References: Message-ID: <-QI5-Oa2gWEKoW2wQBK_kLb7Betw0fP34Uc5DnDZ5G4=.738346e7-4ca4-420e-906e-e4353d2e4e30@github.com> On Thu, 12 Jun 2025 11:10:38 GMT, Manuel H?ssig wrote: > This PR performs some cleanup and formatting around the phase definitions in C2: > - the phase descriptions are capitalized according to the [MLA Handbook title case rules](https://en.wikipedia.org/wiki/Title_case#Modern_Language_Association_(MLA)_Handbook), > - the phases are reordered to be more or less in the order of execution or occurrence in the code, > - the definitions in `phasetype.hpp` and `CompilePhase.java` are synced, > - `CompilePhase.java` is aligned for better readability. > > This change was tested on: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15605662671) > - [x] tier1 plus some Oracle internal testing This pull request has now been integrated. Changeset: b4c4496e Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/b4c4496ef8013df25b6368bdebf082d223d6afed Stats: 147 lines in 2 files changed: 28 ins; 27 del; 92 mod 8354196: C2: reorder and capitalize phase definition Reviewed-by: chagedorn, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/25778 From mli at openjdk.org Fri Jun 13 13:07:27 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 13:07:27 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v11] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - remove log - refine comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/0f5ca7de..7638e272 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=09-10 Stats: 73 lines in 3 files changed: 11 ins; 62 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From fyang at openjdk.org Fri Jun 13 13:13:30 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 13 Jun 2025 13:13:30 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v11] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: <5bgfxXQ6pnGdxf93cXxMWMZtAJIpxRUr2dFUiFoFmYk=.2977a804-ed5f-42a2-a15e-ec21419e17e2@github.com> On Fri, 13 Jun 2025 13:07:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log > - refine comments src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1276: > 1274: // > 1275: // Set dst (CMoveI (Binary cop (CmpF/D op1 op2)) (Binary dst src)) > 1276: // 1. (op1 lt NaN) => true => CMove: dst = src I think you can rename `NaN` to `op2` and add some words before this like: `If one or both inputs to the compare is a NaN,`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25696#discussion_r2145060874 From mli at openjdk.org Fri Jun 13 13:21:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 13:21:13 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v12] In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25696/files - new: https://git.openjdk.org/jdk/pull/25696/files/7638e272..d3dff795 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25696&range=10-11 Stats: 7 lines in 1 file changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25696.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25696/head:pull/25696 PR: https://git.openjdk.org/jdk/pull/25696 From mli at openjdk.org Fri Jun 13 13:23:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 13 Jun 2025 13:23:42 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v11] In-Reply-To: <5bgfxXQ6pnGdxf93cXxMWMZtAJIpxRUr2dFUiFoFmYk=.2977a804-ed5f-42a2-a15e-ec21419e17e2@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> <5bgfxXQ6pnGdxf93cXxMWMZtAJIpxRUr2dFUiFoFmYk=.2977a804-ed5f-42a2-a15e-ec21419e17e2@github.com> Message-ID: <2xJLVF2m3kiKWEhC3w5Y7SKcrP_v2IWhQ7n8emDk8gY=.99193baf-b23d-458d-b53a-3264cdd2acef@github.com> On Fri, 13 Jun 2025 13:09:51 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove log >> - refine comments > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1276: > >> 1274: // >> 1275: // Set dst (CMoveI (Binary cop (CmpF/D op1 op2)) (Binary dst src)) >> 1276: // 1. (op1 lt NaN) => true => CMove: dst = src > > I think you can rename `NaN` to `op2` and add some words before this like: `If one or both inputs to the compare is a NaN,`. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25696#discussion_r2145082415 From kvn at openjdk.org Fri Jun 13 13:49:31 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Jun 2025 13:49:31 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Wed, 11 Jun 2025 09:48:19 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review on code style and adding failing Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2924905773 From eastigeevich at openjdk.org Fri Jun 13 14:04:42 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 13 Jun 2025 14:04:42 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction Message-ID: Speculation Barrier (SB) instruction can be used instead of a pair of DSB, ISB if supported. It should have better performance than DSB+ISB (https://developer.arm.com/documentation/102825/0100): > ... a DSB+ISB sequence is expected to have a significantly greater impact on performance than an SB ... CPUs supporting it: - Apple M2+ - Neoverse-N2 - Neoverse-V2 Tested: - Gtests passed ------------- Commit messages: - 8359435: AArch64: add support for 8.5 SB instruction Changes: https://git.openjdk.org/jdk/pull/25801/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359435 Stats: 297 lines in 3 files changed: 5 ins; 0 del; 292 mod Patch: https://git.openjdk.org/jdk/pull/25801.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25801/head:pull/25801 PR: https://git.openjdk.org/jdk/pull/25801 From eastigeevich at openjdk.org Fri Jun 13 14:04:42 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 13 Jun 2025 14:04:42 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 14:00:08 GMT, Evgeny Astigeevich wrote: > Speculation Barrier (SB) instruction can be used instead of a pair of DSB, ISB if supported. It should have better performance than DSB+ISB (https://developer.arm.com/documentation/102825/0100): > >> ... a DSB+ISB sequence is expected to have a significantly greater impact on performance than an SB ... > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tested: > - Gtests passed Hi @theRealAph , Could you please take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2970512577 From fyang at openjdk.org Fri Jun 13 14:07:31 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 13 Jun 2025 14:07:31 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v12] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Fri, 13 Jun 2025 13:21:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments Latest version looks great. Thanks! This should also be backported to jdk25 branch: https://github.com/openjdk/jdk/tree/jdk25 ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25696#pullrequestreview-2924960589 From aph at openjdk.org Fri Jun 13 14:24:27 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 13 Jun 2025 14:24:27 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 14:00:08 GMT, Evgeny Astigeevich wrote: > Speculation Barrier (SB) instruction can be used instead of a pair of DSB, ISB if supported. It should have better performance than DSB+ISB (https://developer.arm.com/documentation/102825/0100): > >> ... a DSB+ISB sequence is expected to have a significantly greater impact on performance than an SB ... > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tested: > - Gtests passed I think this can wait until we have a use for `SB`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2970563338 From bkilambi at openjdk.org Fri Jun 13 15:20:58 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 13 Jun 2025 15:20:58 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v2] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge master - 8348868: AArch64: Add backend support for SelectFromTwoVector This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. For 64-bit vector length : Neon tbl instruction is generated for T_SHORT and T_BYTE types only. For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - Benchmark (size) Mode Cnt Gain SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. ------------- Changes: https://git.openjdk.org/jdk/pull/23570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=01 Stats: 313 lines in 9 files changed: 220 ins; 0 del; 93 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Fri Jun 13 15:20:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 13 Jun 2025 15:20:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v2] In-Reply-To: <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> References: <-exSdNf1CuxqYL--Mi4-L1m2Gop9bPIvdgqQEpAUIeM=.5f4936a7-31d4-45b7-bddf-e973b3687c18@github.com> <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> Message-ID: <07DfDtDNRGryfnlj6Q8rdON7s3QozKKRHCML3janIwc=.e2122cfb-241f-4bc1-a9c4-50b7bb94c63b@github.com> On Thu, 27 Feb 2025 02:04:24 GMT, Xiaohong Gong wrote: >> Thank you for your inputs. I'll look into this. > > Hi @Bhavana-Kilambi , I'v created a new PR https://github.com/openjdk/jdk/pull/23790 to implement the `VectorRearrange` for small lane count vector types like `2D`. I think the implementation is quite same with what we discussed here. Any feedback please let me know. Thanks! Hi @XiaohongGong , I just got back to working on this PR again! I have been trying to implement this operation for Doubles/Longs but the performance is 0.8x that of the default implementation (with two vector rearranges and a vector blend). The implementation using `bsl` that I used is given below - dup(tmp1, T2D, src1, 0); dup(tmp2, T2D, src1, 1); mov(tmp3, T2D, 0x01); andr(tmp4, T16B, index, tmp3); negr(tmp4, T2D, tmp4); orr(tmp5, T16B, tmp4, tmp4); bsl(tmp4, T16B, tmp2, tmp1); dup(tmp1, T2D, src2, 0); dup(tmp2, T2D, src2, 1); bsl(tmp5, T16B, tmp2, tmp1); sshr(dst, T2D, index, 1); andr(dst, T16B, dst, tmp3); negr(dst, T2D, dst); bsl(dst, T16B, tmp5, tmp4); This is based on the fact that the index vector can only contain values = 0 to 3. If the first bit is 0/1 it refers to the first or second double/long and if the second bit is 0/1 it selects the source (either src1/src2). index = 00 -> choose first double/long of src1 01 -> choose second double/long of src1 10 -> choose first double/long of src2 11 -> choose second double/long of src2 I am not able to avoid duplicating the source elements. Would it be ok if I do not support SelectFromTwoVector for doubles/longs or do you have any suggestion on how I can improve my implementation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2123114619 From xgong at openjdk.org Fri Jun 13 15:20:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 13 Jun 2025 15:20:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v2] In-Reply-To: <07DfDtDNRGryfnlj6Q8rdON7s3QozKKRHCML3janIwc=.e2122cfb-241f-4bc1-a9c4-50b7bb94c63b@github.com> References: <-exSdNf1CuxqYL--Mi4-L1m2Gop9bPIvdgqQEpAUIeM=.5f4936a7-31d4-45b7-bddf-e973b3687c18@github.com> <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> <07DfDtDNRGryfnlj6Q8rdON7s3QozKKRHCML3janIwc=.e2122cfb-241f-4bc1-a9c4-50b7bb94c63b@github.com> Message-ID: On Tue, 3 Jun 2025 08:25:43 GMT, Bhavana Kilambi wrote: >> Hi @Bhavana-Kilambi , I'v created a new PR https://github.com/openjdk/jdk/pull/23790 to implement the `VectorRearrange` for small lane count vector types like `2D`. I think the implementation is quite same with what we discussed here. Any feedback please let me know. Thanks! > > Hi @XiaohongGong , I just got back to working on this PR again! > I have been trying to implement this operation for Doubles/Longs but the performance is 0.8x that of the default implementation (with two vector rearranges and a vector blend). The implementation using `bsl` that I used is given below - > > > dup(tmp1, T2D, src1, 0); > dup(tmp2, T2D, src1, 1); > > mov(tmp3, T2D, 0x01); > andr(tmp4, T16B, index, tmp3); > negr(tmp4, T2D, tmp4); > orr(tmp5, T16B, tmp4, tmp4); > > bsl(tmp4, T16B, tmp2, tmp1); > > dup(tmp1, T2D, src2, 0); > dup(tmp2, T2D, src2, 1); > > bsl(tmp5, T16B, tmp2, tmp1); > > sshr(dst, T2D, index, 1); > andr(dst, T16B, dst, tmp3); > negr(dst, T2D, dst); > > bsl(dst, T16B, tmp5, tmp4); > > > > This is based on the fact that the index vector can only contain values = 0 to 3. If the first bit is 0/1 it refers to the first or second double/long and if the second bit is 0/1 it selects the source (either src1/src2). > index = 00 -> choose first double/long of src1 > 01 -> choose second double/long of src1 > 10 -> choose first double/long of src2 > 11 -> choose second double/long of src2 > > I am not able to avoid duplicating the source elements. > Would it be ok if I do not support SelectFromTwoVector for doubles/longs or do you have any suggestion on how I can improve my implementation? Oh, I forgot that we have the `blend + rearrange` pattern if this op is not supported directly. Since `VectorRearrange` for 2D have been implemented now, did you check the final codegen of the default pattern? I think we can revisit the codegen first with the default pattern (i.e. `VectorBlend + VectorRearrange + VectorRearrange`), and find whether there is further improvement opportunity for that. If so, we can implement the `SelectFromTwoVectors` op directly based on the improvement point. Otherwise, just keep using the default pattern will be fine to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2123214296 From bkilambi at openjdk.org Fri Jun 13 15:20:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 13 Jun 2025 15:20:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v2] In-Reply-To: References: <-exSdNf1CuxqYL--Mi4-L1m2Gop9bPIvdgqQEpAUIeM=.5f4936a7-31d4-45b7-bddf-e973b3687c18@github.com> <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> <07DfDtDNRGryfnlj6Q8rdON7s3QozKKRHCML3janIwc=.e2122cfb-241f-4bc1-a9c4-50b7bb94c63b@github.com> Message-ID: On Tue, 3 Jun 2025 09:09:21 GMT, Xiaohong Gong wrote: >> Hi @XiaohongGong , I just got back to working on this PR again! >> I have been trying to implement this operation for Doubles/Longs but the performance is 0.8x that of the default implementation (with two vector rearranges and a vector blend). The implementation using `bsl` that I used is given below - >> >> >> dup(tmp1, T2D, src1, 0); >> dup(tmp2, T2D, src1, 1); >> >> mov(tmp3, T2D, 0x01); >> andr(tmp4, T16B, index, tmp3); >> negr(tmp4, T2D, tmp4); >> orr(tmp5, T16B, tmp4, tmp4); >> >> bsl(tmp4, T16B, tmp2, tmp1); >> >> dup(tmp1, T2D, src2, 0); >> dup(tmp2, T2D, src2, 1); >> >> bsl(tmp5, T16B, tmp2, tmp1); >> >> sshr(dst, T2D, index, 1); >> andr(dst, T16B, dst, tmp3); >> negr(dst, T2D, dst); >> >> bsl(dst, T16B, tmp5, tmp4); >> >> >> >> This is based on the fact that the index vector can only contain values = 0 to 3. If the first bit is 0/1 it refers to the first or second double/long and if the second bit is 0/1 it selects the source (either src1/src2). >> index = 00 -> choose first double/long of src1 >> 01 -> choose second double/long of src1 >> 10 -> choose first double/long of src2 >> 11 -> choose second double/long of src2 >> >> I am not able to avoid duplicating the source elements. >> Would it be ok if I do not support SelectFromTwoVector for doubles/longs or do you have any suggestion on how I can improve my implementation? > > Oh, I forgot that we have the `blend + rearrange` pattern if this op is not supported directly. Since `VectorRearrange` for 2D have been implemented now, did you check the final codegen of the default pattern? I think we can revisit the codegen first with the default pattern (i.e. `VectorBlend + VectorRearrange + VectorRearrange`), and find whether there is further improvement opportunity for that. If so, we can implement the `SelectFromTwoVectors` op directly based on the improvement point. Otherwise, just keep using the default pattern will be fine to me. Hi @XiaohongGong , thanks for the idea. I did check the codegen and I saw that the iota vectors were being loaded twice for both the source vectors which I felt could be eliminated. So I created a separate implementation for `SelectFromTwoVector` with the code for both the `VectorRearrange` and `VectorBlend` as show below - lea(rscratch1, ExternalAddress(StubRoutines::aarch64::vector_iota_indices() + 48)); ldrq(tmp1, rscratch1); mov(tmp2, T2D, 0x01); andr(tmp3, size1, index, tmp2); cm(EQ, tmp3, size2, tmp1, tmp3); orr(tmp1, T16B, tmp3, tmp3); ext(tmp4, size1, src1, src1, 8); ext(tmp5, size1, src2, src2, 8); cm(GE, dst, size2, tmp2, index); bsl(tmp3, size1, src1, tmp4); bsl(tmp1, size1, src2, tmp5); bsl(dst, size1, tmp3, tmp1); I have rearranged the instructions and used `tmp5` (I could have reused `tmp4` in the second `ext`) to allow for more ILP. This implementation is certainly better than my previous implementation by ~23% for `double` and 31% for `long` but the performance is not much different from the default implementation (VectorRearrange + VectorBlend). For `double`, the performance is exactly the same and for `long` it is 0.97x. I collected some perf numbers for the cases with and without this patch. My implementation certainly executes fewer instructions compared to the default implementation but there is more ILP in the default implementation due to which it's performance is either better or the same as my implementation. I feel we can use the default implementation for `doubles` and `longs`? WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2124204144 From xgong at openjdk.org Fri Jun 13 15:20:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 13 Jun 2025 15:20:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector In-Reply-To: References: <03mNhjjP_PvR9nxPUCaIkN5NF--gH7-AMqiHJlAzJW0=.e0e1cd1e-f236-4a6d-b9da-1459eed6077d@github.com> Message-ID: On Thu, 5 Jun 2025 08:03:02 GMT, Bhavana Kilambi wrote: > > Good job @Bhavana-Kilambi ! Generally looks good to me. Just some minor issues that I have left the comments. Besides, could you please add some IR tests for this optimization? Thanks! > > Hi @XiaohongGong , there are tests already for this operation under `jdk/jdk/incubator/vector` for all the types and sizes to verify the results. Did you mean IR tests for verifying if the correct backend match rule is being generated ? Yes, I think adding an IR check tests for this operation will be better. I think checking the mid-end IR is enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23570#issuecomment-2947606572 From xgong at openjdk.org Fri Jun 13 15:20:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 13 Jun 2025 15:20:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v2] In-Reply-To: References: <-exSdNf1CuxqYL--Mi4-L1m2Gop9bPIvdgqQEpAUIeM=.5f4936a7-31d4-45b7-bddf-e973b3687c18@github.com> <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> <07DfDtDNRGryfnlj6Q8rdON7s3QozKKRHCML3janIwc=.e2122cfb-241f-4bc1-a9c4-50b7bb94c63b@github.com> Message-ID: On Tue, 3 Jun 2025 15:17:07 GMT, Bhavana Kilambi wrote: >> Oh, I forgot that we have the `blend + rearrange` pattern if this op is not supported directly. Since `VectorRearrange` for 2D have been implemented now, did you check the final codegen of the default pattern? I think we can revisit the codegen first with the default pattern (i.e. `VectorBlend + VectorRearrange + VectorRearrange`), and find whether there is further improvement opportunity for that. If so, we can implement the `SelectFromTwoVectors` op directly based on the improvement point. Otherwise, just keep using the default pattern will be fine to me. > > Hi @XiaohongGong , thanks for the idea. I did check the codegen and I saw that the iota vectors were being loaded twice for both the source vectors which I felt could be eliminated. So I created a separate implementation for `SelectFromTwoVector` with the code for both the `VectorRearrange` and `VectorBlend` as show below - > > > lea(rscratch1, > ExternalAddress(StubRoutines::aarch64::vector_iota_indices() + 48)); > ldrq(tmp1, rscratch1); > mov(tmp2, T2D, 0x01); > andr(tmp3, size1, index, tmp2); > cm(EQ, tmp3, size2, tmp1, tmp3); > orr(tmp1, T16B, tmp3, tmp3); > ext(tmp4, size1, src1, src1, 8); > ext(tmp5, size1, src2, src2, 8); > > cm(GE, dst, size2, tmp2, index); > bsl(tmp3, size1, src1, tmp4); > > bsl(tmp1, size1, src2, tmp5); > > bsl(dst, size1, tmp3, tmp1); > > > > I have rearranged the instructions and used `tmp5` (I could have reused `tmp4` in the second `ext`) to allow for more ILP. > > This implementation is certainly better than my previous implementation by ~23% for `double` and 31% for `long` but the performance is not much different from the default implementation (VectorRearrange + VectorBlend). For `double`, the performance is exactly the same and for `long` it is 0.97x. I collected some perf numbers for the cases with and without this patch. My implementation certainly executes fewer instructions compared to the default implementation but there is more ILP in the default implementation due to which it's performance is either better or the same as my implementation. I feel we can use the default implementation for `doubles` and `longs`? WDYT? It's fine to me. Thanks for your testing! Using the mid-end IR pattern looks better that it may have other mid-end optimization opportunities in some case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2125241593 From bkilambi at openjdk.org Fri Jun 13 15:20:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 13 Jun 2025 15:20:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector In-Reply-To: <03mNhjjP_PvR9nxPUCaIkN5NF--gH7-AMqiHJlAzJW0=.e0e1cd1e-f236-4a6d-b9da-1459eed6077d@github.com> References: <03mNhjjP_PvR9nxPUCaIkN5NF--gH7-AMqiHJlAzJW0=.e0e1cd1e-f236-4a6d-b9da-1459eed6077d@github.com> Message-ID: On Mon, 31 Mar 2025 09:52:27 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Hello, I would not be able to respond to comments until the next couple months or so due to some urgent tasks at work. Until then, I'd move this PR to draft status so that it would not be closed due to lack of activity. Thank you for the review! > Good job @Bhavana-Kilambi ! Generally looks good to me. Just some minor issues that I have left the comments. Besides, could you please add some IR tests for this optimization? Thanks! Hi @XiaohongGong , there are tests already for this operation under `jdk/jdk/incubator/vector` for all the types and sizes to verify the results. Did you mean IR tests for verifying if the correct backend match rule is being generated ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23570#issuecomment-2943161704 From dlunden at openjdk.org Fri Jun 13 15:45:35 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 13 Jun 2025 15:45:35 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Wed, 11 Jun 2025 09:48:19 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review on code style and adding failing Nice cleanup! Please see my code comments. > Is the new macro elimination phase OK, or should we change anything? I think it looks reasonable, but please see my code comments. > In compile.cpp , PHASE_ITER_GVN_AFTER_ELIMINATION follows PHASE_AFTER_MACRO_ELIMINATION in the current fix. Should PHASE_ITER_GVN_AFTER_ELIMINATION be removed ? Let's leave it, as there is an IGVN run in between them. src/hotspot/share/opto/compile.cpp line 2535: > 2533: print_method(PHASE_BEFORE_MACRO_EXPANSION, 3); > 2534: PhaseMacroExpand mex(igvn); > 2535: //Last attempt to eliminate macro nodes before expand Suggestion: // Last attempt to eliminate macro nodes before expand src/hotspot/share/opto/macro.cpp line 2468: > 2466: } > 2467: } > 2468: Suggestion: src/hotspot/share/opto/macro.cpp line 2482: > 2480: return; > 2481: } > 2482: refine_strip_mined_loop_macro_nodes(); This call is later compared to before, right? In the previous version of `expand_macro_nodes`, it ran before the call to `eliminate_macro_nodes`. Is it safe to move it in this way? src/hotspot/share/opto/macro.cpp line 2554: > 2552: bool PhaseMacroExpand::expand_macro_nodes() { > 2553: // Do not allow new macro nodes once we started to expand > 2554: C->reset_allow_macro_nodes(); Same here, this call is later compared to before (it is at the top of the old `expand_macro_nodes`, before `eliminate_macro_nodes`). Is this a safe move? src/hotspot/share/opto/macro.hpp line 1: > 1: /* Needs a copyright update! src/utils/IdealGraphVisualizer/README.md line 36: > 34: * `N=3`: additionally, after every minor phase > 35: * `N=4`: additionally, after every loop optimization > 36: * `N=5`: additionally, after every effective IGVN, and every macro elimination and expansion step (slow) Suggestion: * `N=5`: additionally, after every effective IGVN, macro elimination, and macro expansion step (slow) ------------- Changes requested by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2925252596 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2145350347 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2145353088 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2145361681 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2145364636 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2145368330 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2145370791 From eastigeevich at openjdk.org Fri Jun 13 16:08:27 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 13 Jun 2025 16:08:27 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 14:00:08 GMT, Evgeny Astigeevich wrote: > Speculation Barrier (SB) instruction can be used instead of a pair of DSB, ISB if supported. It should have better performance than DSB+ISB (https://developer.arm.com/documentation/102825/0100): > >> ... a DSB+ISB sequence is expected to have a significantly greater impact on performance than an SB ... > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tested: > - Gtests passed The case is to use it for spin pauses instead of `ISB` on Neoverse-N2/V2. There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: - https://github.com/mysql/mysql-server/pull/611 - https://github.com/facebook/folly/pull/2390 There are discussions regarding using it for spin pauses: - https://github.com/gperftools/gperftools/pull/1594 - https://github.com/haproxy/haproxy/pull/2974 Do you think it is better to have a PR combining this PR and use of SB for spin pauses? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2970848883 From dcubed at openjdk.org Fri Jun 13 16:14:46 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 13 Jun 2025 16:14:46 GMT Subject: RFR: JDK-8348574 : Simplify c1/c2_globals inclusions In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 08:32:56 GMT, Suchismith Roy wrote: > JBS Issue : [JDK-8348574](https://bugs.openjdk.org/browse/JDK-8348574) > > c1_globals.hpp includes c1_globals_pd.hpp. c1_globals_pd.hpp includes the corresponding CPU_HEADER and OS_HEADER files. All of the c1_globals_.hpp files are essentially identical and basically empty. (They just include globalDefinitions.hpp and macros.hpp, and provide nothing additional.) > > This could be simplified by having c1_globals.hpp do the CPU_HEADER inclusion directly, and remove c1_globals_pd.hpp and all c1_globals_.hpp files. > > Even if there are some non-vacuous c1_globals_.hpp files in the future, c1_globals_pd.hpp seems unwarranted; just add the OS_HEADER include directly in c1_globals.hpp. The c1_globals_pd.hpp files really don't seem worth the extra indirection. > > Similarly for c2_globals.hpp &etc. (Gotta love it when you typo a lable^H^H^H^H^H label). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25773#issuecomment-2970862022 From eastigeevich at openjdk.org Fri Jun 13 16:20:31 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 13 Jun 2025 16:20:31 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 14:00:08 GMT, Evgeny Astigeevich wrote: > Speculation Barrier (SB) instruction can be used instead of a pair of DSB, ISB if supported. It should have better performance than DSB+ISB (https://developer.arm.com/documentation/102825/0100): > >> ... a DSB+ISB sequence is expected to have a significantly greater impact on performance than an SB ... > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tested: > - Gtests passed BTW Arm published a post in their blog about different implementations of spin pauses: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/multi-threaded-applications-arm A high accuracy delay requires FEAT_SB (Armv8.5-A), FEAT_ECV (Armv8.6-A) and FEAT_WFxT (Armv8.7-A). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2970878177 From shade at openjdk.org Fri Jun 13 17:14:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Jun 2025 17:14:29 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 14:00:08 GMT, Evgeny Astigeevich wrote: > Speculation Barrier (SB) instruction can be used instead of a pair of DSB, ISB if supported. It should have better performance than DSB+ISB (https://developer.arm.com/documentation/102825/0100): > >> ... a DSB+ISB sequence is expected to have a significantly greater impact on performance than an SB ... > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tested: > - Gtests passed FWIW, I don't mind the SB assembler support to go under this, separate PR. We sometimes do it to split the work in the series of atomic commits, where the commit like this should certainly be non-regressing. The actual use of SB (spin-pauses) can then come under separate RFE, and would require much more work (and have associated risk). So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2970996306 From cslucas at openjdk.org Fri Jun 13 17:35:36 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 13 Jun 2025 17:35:36 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: <7bjdVQUAb1lGw-G0tICDh8GB-g_u-VqISmBAyzTwSj8=.8e5afe75-14cc-44f1-b46d-e62a4ffa0586@github.com> On Thu, 12 Jun 2025 20:18:10 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix some remaining renames. @shipilev , @vnkozlov - do you want take a look on this / any comment? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2971069243 From never at openjdk.org Fri Jun 13 19:00:28 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 13 Jun 2025 19:00:28 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 20:18:10 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix some remaining renames. This seems like a very complicated way for Truffle to tell itself that it invalidated some installed code. Couldn't this be done with bookkeeping purely on the Truffle side? A weak map of InstalledCode instances that were invalidated because they were cold would be completely sufficient, wouldn't it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2971335423 From never at openjdk.org Fri Jun 13 19:08:29 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 13 Jun 2025 19:08:29 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 20:18:10 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix some remaining renames. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1409: > 1407: C2V_VMENTRY(void, invalidateHotSpotNmethod, (JNIEnv* env, jobject, jobject hs_nmethod, jboolean deoptimize, jint invalidation_reason)) > 1408: JVMCIObject nmethod_mirror = JVMCIENV->wrap(hs_nmethod); > 1409: JVMCIENV->invalidate_nmethod_mirror(nmethod_mirror, deoptimize, static_cast(invalidation_reason), JVMCI_CHECK); There should probably be a check here that `invalidation_reason` is in the range of `nmethod::InvalidationReason`. src/hotspot/share/jvmci/jvmciRuntime.cpp line 818: > 816: HotSpotJVMCI::InstalledCode::set_address(jvmciEnv, nmethod_mirror, 0); > 817: HotSpotJVMCI::InstalledCode::set_entryPoint(jvmciEnv, nmethod_mirror, 0); > 818: HotSpotJVMCI::HotSpotNmethod::set_invalidationReason(jvmciEnv, nmethod_mirror, static_cast(invalidation_reason)); I think you're in danger of overwriting the invalidationReason here. `invalidate_nmethod_mirror` can be called for unloaded nmethods after they have been made not entrant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2145899316 PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2145896972 From cslucas at openjdk.org Fri Jun 13 19:19:31 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 13 Jun 2025 19:19:31 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 18:57:43 GMT, Tom Rodriguez wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix some remaining renames. > > This seems like a very complicated way for Truffle to tell itself that it invalidated some installed code. Couldn't this be done with bookkeeping purely on the Truffle side? A weak map of InstalledCode instances that were invalidated because they were cold would be completely sufficient, wouldn't it? Thank you for commenting @tkrodriguez . > This seems like a very complicated way for Truffle to tell itself that it invalidated some installed code. If by "it" you mean Truffle then the answer is no. The main point of this change is to communicate to Truffle when something in the HotSpot side invalidated the code, _in particular_ [I'm interested](https://github.com/oracle/graal/issues/11045) in the situation when the heuristics that monitor the code cache caused the eviction of a nmethod because it considered it _cold_. Truffle itself uses a different concept of hot/cold method... ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2971400634 From never at openjdk.org Fri Jun 13 20:07:58 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 13 Jun 2025 20:07:58 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 20:18:10 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix some remaining renames. Sorry for the confusion on my part. The lack of a PR that's consuming these changes makes it harder to know which parts are the important ones. src/hotspot/share/code/nmethod.cpp line 2125: > 2123: JVMCINMethodData* nmethod_data = jvmci_nmethod_data(); > 2124: if (nmethod_data != nullptr) { > 2125: nmethod_data->invalidate_nmethod_mirror(this, is_cold() ? nmethod::InvalidationReason::GC_UNLINKING_COLD : nmethod::InvalidationReason::GC_UNLINKING); So then this is the heart of what you're after? Maybe `GC_UNLINKING_COLD` should be `COLD_UNLOADING`? GC is doing it but it's not for GC reasons. `GC_UNLINKING` might better be `GC_UNLOADING`. I don't think the `nmethod::unlink` name is a good one to propagate into enum names. `is_cold` isn't completely reliable here as an method could both have dead oops and be cold. I guess that probably doesn't matter though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2971527074 PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2146005973 From cslucas at openjdk.org Fri Jun 13 20:41:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 13 Jun 2025 20:41:33 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 20:02:58 GMT, Tom Rodriguez wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix some remaining renames. > > src/hotspot/share/code/nmethod.cpp line 2125: > >> 2123: JVMCINMethodData* nmethod_data = jvmci_nmethod_data(); >> 2124: if (nmethod_data != nullptr) { >> 2125: nmethod_data->invalidate_nmethod_mirror(this, is_cold() ? nmethod::InvalidationReason::GC_UNLINKING_COLD : nmethod::InvalidationReason::GC_UNLINKING); > > So then this is the heart of what you're after? Maybe `GC_UNLINKING_COLD` should be `COLD_UNLOADING`? GC is doing it but it's not for GC reasons. `GC_UNLINKING` might better be `GC_UNLOADING`. I don't think the `nmethod::unlink` name is a good one to propagate into enum names. > > `is_cold` isn't completely reliable here as an method could both have dead oops and be cold. I guess that probably doesn't matter though. I'll do the renaming, good point there. > is_cold isn't completely reliable here as an method could both have dead oops and be cold. I guess that probably doesn't matter though. I don't think that matter because since the nmethod is cold right now it could have been eliminated for that reason at any moment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2146059210 From fjiang at openjdk.org Sat Jun 14 02:50:38 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 14 Jun 2025 02:50:38 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v12] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Fri, 13 Jun 2025 13:21:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Thanks! >> >> Currently, this issue is only reproducible with Dacapo sunflow. >> I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. >> >> So, currently I can only verify the code by reviewing it. >> Or maybe it's better to leave it until we find the test? > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25696#pullrequestreview-2927049565 From aph at openjdk.org Sat Jun 14 15:56:34 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Jun 2025 15:56:34 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 16:06:08 GMT, Evgeny Astigeevich wrote: > Do you think it is better to have a PR combining this PR and use of SB for spin pauses? Yes, definitely, otherwise we're pushing dead code. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2972844462 From aph at openjdk.org Sat Jun 14 16:03:30 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Jun 2025 16:03:30 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 17:11:20 GMT, Aleksey Shipilev wrote: > So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it. Huh? The least confusing is when the SB support goes in the PR where it is used. That really is obvious, without any dependency chain. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2972847134 From cslucas at openjdk.org Sun Jun 15 01:00:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Sun, 15 Jun 2025 01:00:33 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 19:03:54 GMT, Tom Rodriguez wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix some remaining renames. > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 818: > >> 816: HotSpotJVMCI::InstalledCode::set_address(jvmciEnv, nmethod_mirror, 0); >> 817: HotSpotJVMCI::InstalledCode::set_entryPoint(jvmciEnv, nmethod_mirror, 0); >> 818: HotSpotJVMCI::HotSpotNmethod::set_invalidationReason(jvmciEnv, nmethod_mirror, static_cast(invalidation_reason)); > > I think you're in danger of overwriting the invalidationReason here. `invalidate_nmethod_mirror` can be called for unloaded nmethods after they have been made not entrant. I see, thank you for the catch! I'm not sure how I can detect this case. My first idea was to just check if the "invalidationReason" field is already set or not, but looks like in order to do that I'd need to wrap the `nmethod_mirror` in a `JVMCIObject`. Unfortunately, there is a comment just a few lines above this change saying to _not_ do this wrapping. I'm not quite familiar with all nmethod state transitions. Would it be sufficient to just add a check for whether the nmethod is already non-entrant on line 818? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2147389125 From thartmann at openjdk.org Sun Jun 15 09:08:32 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Sun, 15 Jun 2025 09:08:32 GMT Subject: [jdk25] RFR: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: <64qNpwRFpvfzI6d-HHXJLlIpTJySMca3hxFz1qmwukg=.294a3fe2-cefe-4ad2-b012-1589087cc251@github.com> On Fri, 13 Jun 2025 11:20:44 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [e8ef93ae](https://github.com/openjdk/jdk/commit/e8ef93ae9de624f25166bdf010c915672b2c5cf4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Manuel H?ssig on 13 Jun 2025 and was reviewed by Tobias Hartmann, Dean Long and Damon Fenacci. > > Thanks! Thanks Aleksey! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25798#issuecomment-2973605194 From thartmann at openjdk.org Sun Jun 15 09:08:33 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Sun, 15 Jun 2025 09:08:33 GMT Subject: [jdk25] Integrated: 8357782: JVM JIT Causes Static Initialization Order Issue In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 11:20:44 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [e8ef93ae](https://github.com/openjdk/jdk/commit/e8ef93ae9de624f25166bdf010c915672b2c5cf4) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Manuel H?ssig on 13 Jun 2025 and was reviewed by Tobias Hartmann, Dean Long and Damon Fenacci. > > Thanks! This pull request has now been integrated. Changeset: 3bd80fe3 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/3bd80fe3bab41135e9362c915862e150942f94dd Stats: 84 lines in 4 files changed: 82 ins; 0 del; 2 mod 8357782: JVM JIT Causes Static Initialization Order Issue Reviewed-by: shade Backport-of: e8ef93ae9de624f25166bdf010c915672b2c5cf4 ------------- PR: https://git.openjdk.org/jdk/pull/25798 From fyang at openjdk.org Mon Jun 16 03:36:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 16 Jun 2025 03:36:33 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v2] In-Reply-To: References: Message-ID: > Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. > > There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. > > And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. > > Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. > > Testing: > - [x] Tier1-3 on linux-aarch64 (release & fastdebug) > - [x] Tier1-3 on linux-riscv64 (release) > - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - add test - Merge remote-tracking branch 'upstream/master' into JDK-8359270 - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25765/files - new: https://git.openjdk.org/jdk/pull/25765/files/62218432..581f587a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=00-01 Stats: 10192 lines in 182 files changed: 8348 ins; 1025 del; 819 mod Patch: https://git.openjdk.org/jdk/pull/25765.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25765/head:pull/25765 PR: https://git.openjdk.org/jdk/pull/25765 From fyang at openjdk.org Mon Jun 16 03:46:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 16 Jun 2025 03:46:33 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call In-Reply-To: References: Message-ID: <80GKJ8hRLgdGTgI4hiF067aG0QDaUBjAC-FbLZ2RlZ4=.c1670a81-9925-461c-8179-611a31fe69b1@github.com> On Fri, 13 Jun 2025 08:26:45 GMT, Tobias Hartmann wrote: > But `StubRoutines::select_arraycopy_function` also sets `copyfunc_name` which is printed for the `CallNode` and should therefore be matchable by the IR framework, right? Yes. I missed that `StubRoutines::select_arraycopy_function` would return a `copyfunc_name`. And I haved added a test to cover the changes for all three callsites. Verified on three different platforms (x64, aarch64 and riscv64). The test passes with this change and fails otherwise with fastdebug build. Please take another look. Thanks for the suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2975012411 From amitkumar at openjdk.org Mon Jun 16 04:07:30 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Jun 2025 04:07:30 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Wed, 11 Jun 2025 15:02:12 GMT, Damon Fenacci wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix >> - move the changes in flag constraints specific file > > I might be a bit picky (sorry for that) but since the flag was triggering a crash I was wondering if we could have a small regression test to make sure the VM never crashes (possibly checking the error as well). @dafedafe can you have a look again ? I have added the testcase. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2975036638 From jkarthikeyan at openjdk.org Mon Jun 16 05:32:19 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 16 Jun 2025 05:32:19 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v3] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Add assert for unexpected node in truncation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/da692994..e2ab39c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=01-02 Stats: 24 lines in 1 file changed: 24 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Mon Jun 16 05:32:19 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 16 Jun 2025 05:32:19 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Wed, 28 May 2025 07:46:12 GMT, Emanuel Peter wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > And just for good measure: should we also add tests for `char`? @eme64 Thank for taking another look! I've pushed a commit that adds an assert so that we can detect when unexpected nodes pass through the truncation filter. BTW, how did the testing go? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2975152572 From jkarthikeyan at openjdk.org Mon Jun 16 05:32:19 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 16 Jun 2025 05:32:19 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v2] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 05:18:19 GMT, Emanuel Peter wrote: >> I think this is an interesting point. My concern is that this might cause unexpected assert failures to occur in rare cases, which could create a large bug tail. The behavior in the non-truncation case should not cause miscompiles so I'm not sure that a general assert is warranted. The one outlying case is that unexpected `Type` nodes can cause the later `vt = container_type(in);` line to produce an invalid result since it uses the bottom type, so I think an assert for that specifically could be useful. > >> My concern is that this might cause unexpected assert failures to occur in rare cases, which could create a large bug tail. > > If it is only an assert, and you return the "conservative" answer in product, then it only affects debug. This means we don't have to be too worried about it when the assert fails. And it gives us a chance to look at that new node, and decide if truncation is ok or not. A good comment at the site of the assert would make fixing those assert bugs quite easy. It would be an assert that ensures we get better performance because we vectorize. I think this is easier than having to write IR tests for all possible nodes with all possible truncations, that would be the alternative I suppose. But with IR tests we might always miss an operation. That makes sense. I didn't consider that we can use it to find optimizations/regressions easier, I think it's worthwhile to have an assert for it. I've pushed a commit that adds it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149072841 From epeter at openjdk.org Mon Jun 16 06:11:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:11:33 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v3] In-Reply-To: References: Message-ID: <3hS55vekJ3n3KqQeHEW0P__Gvp2Z76az1MtgBxL2uHU=.6a51c235-b5fc-4e85-b73b-cf8db4539cd8@github.com> On Mon, 16 Jun 2025 05:32:19 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Add assert for unexpected node in truncation Thanks for the updates @jaskarth ! I have some more minor suggestions now. Testing was all good, but we'll have to re-run it with all the new asserts once you have addressed my suggestions :) src/hotspot/share/opto/superword.cpp line 2516: > 2514: case Op_XorI: > 2515: return true; > 2516: } Suggestion: // Can be truncated: switch (opc) { case Op_AddI: case Op_SubI: case Op_MulI: case Op_AndI: case Op_OrI: case Op_XorI: return true; } src/hotspot/share/opto/superword.cpp line 2521: > 2519: if (VectorNode::is_shift_opcode(opc)) { > 2520: return false; > 2521: } Is right shift also not truncatable? Can you add a comment why? src/hotspot/share/opto/superword.cpp line 2536: > 2534: case Op_CountLeadingZerosI: > 2535: case Op_CountTrailingZerosI: > 2536: break; Suggestion: switch (opc) { // Cannot be truncated: case Op_AbsI: case Op_DivI: case Op_MinI: case Op_MaxI: case Op_CMoveI: case Op_RotateRight: case Op_RotateLeft: case Op_PopCountI: case Op_ReverseBytesI: case Op_ReverseI: case Op_CountLeadingZerosI: case Op_CountTrailingZerosI: return false; Here you did a break in assert. For consistency, let's make it a return. I would also understand if you did not want to have any return in the assert code, to make sure product and debug do not have different behavior. Up to you. src/hotspot/share/opto/superword.cpp line 2538: > 2536: break; > 2537: default: > 2538: assert(false, "Unexpected node: %s", NodeClassNames[in->Opcode()]); Suggestion: // If this assert it hit, that means that we need to determine if the node can be safely truncated, // and then add it to the list of truncatable nodes or the list of non-truncatable ones just above. // In product, we just return false, which is always correct. assert(false, "Unexpected node: %s", NodeClassNames[in->Opcode()]); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25440#pullrequestreview-2930728433 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149105906 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149107187 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149108667 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149114689 From epeter at openjdk.org Mon Jun 16 06:11:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:11:33 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v3] In-Reply-To: <3hS55vekJ3n3KqQeHEW0P__Gvp2Z76az1MtgBxL2uHU=.6a51c235-b5fc-4e85-b73b-cf8db4539cd8@github.com> References: <3hS55vekJ3n3KqQeHEW0P__Gvp2Z76az1MtgBxL2uHU=.6a51c235-b5fc-4e85-b73b-cf8db4539cd8@github.com> Message-ID: On Mon, 16 Jun 2025 05:59:00 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Add assert for unexpected node in truncation > > src/hotspot/share/opto/superword.cpp line 2521: > >> 2519: if (VectorNode::is_shift_opcode(opc)) { >> 2520: return false; >> 2521: } > > Is right shift also not truncatable? Can you add a comment why? Here you did a return in assert. > src/hotspot/share/opto/superword.cpp line 2538: > >> 2536: break; >> 2537: default: >> 2538: assert(false, "Unexpected node: %s", NodeClassNames[in->Opcode()]); > > Suggestion: > > // If this assert it hit, that means that we need to determine if the node can be safely truncated, > // and then add it to the list of truncatable nodes or the list of non-truncatable ones just above. > // In product, we just return false, which is always correct. > assert(false, "Unexpected node: %s", NodeClassNames[in->Opcode()]); I'm fairly sure that we will hit this assert with a fuzzer or some other RFE soon, and then it will be nice to know quickly what kind of failure we have here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149109714 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2149115247 From epeter at openjdk.org Mon Jun 16 06:13:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:13:32 GMT Subject: RFR: 8351645: C2: Assertion failures in Expand/CompressBits idealizations with TOP [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 10:22:13 GMT, Jatin Bhateja wrote: >> Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Problem occues during IGVN cleanups after partial peeling of loop. >> >> Test mentioned in the bug report has been included along with the patch. >> >> Kindly review. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8351645: C2: ExpandBitsNode::Ideal hits assert because of TOP input @jatin-bhateja Thanks for the ping! The tests are all passing, perfect :) Thank you for your work, approved! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25586#pullrequestreview-2930750944 From dfenacci at openjdk.org Mon Jun 16 06:13:34 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 16 Jun 2025 06:13:34 GMT Subject: RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: <-WFYyJVFxG0nhBwCJuRcpMZhBUtba6Nf1dHVrhNfFxU=.3b277750-d2ee-495f-8959-4837fcc0354b@github.com> References: <-WFYyJVFxG0nhBwCJuRcpMZhBUtba6Nf1dHVrhNfFxU=.3b277750-d2ee-495f-8959-4837fcc0354b@github.com> Message-ID: <8x9LL_gxK7mpdcvVfjCb4tdc01hDPVWog660mLNOuCM=.18fd985d-3793-4f5e-baa8-29c1a5d21608@github.com> On Tue, 3 Jun 2025 05:36:12 GMT, Tobias Hartmann wrote: >> The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. >> >> There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). >> >> Running them **sequentially** should be OK and should avoid running out of memory. >> >> Testing: Tier1-3+ > > Looks good to me. @TobiHartmann @eme64 thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25582#issuecomment-2975223776 From dfenacci at openjdk.org Mon Jun 16 06:13:34 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 16 Jun 2025 06:13:34 GMT Subject: Integrated: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 10:57:22 GMT, Damon Fenacci wrote: > The test `compiler/startup/StartupOutput.java` starts **200 VMs in a loop** , this can lead to resource shortages on some (Windows) machines. > > There is no real need to run those VMs concurrently (their run is short and basically check that the VM doesn't crash giving limited code cache). > > Running them **sequentially** should be OK and should avoid running out of memory. > > Testing: Tier1-3+ This pull request has now been integrated. Changeset: 534a8605 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/534a8605e5f4d771be69426687b2188d5353c91e Stats: 11 lines in 2 files changed: 2 ins; 8 del; 1 mod 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25582 From dfenacci at openjdk.org Mon Jun 16 06:41:30 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 16 Jun 2025 06:41:30 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v5] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: <1TQkKPqQfXPh6iDbnnrx3kWLwH2HbdXwmEiUiePMz8Q=.b0aaf45c-549d-4f3e-9993-6af7dc871620@github.com> On Thu, 12 Jun 2025 15:58:21 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > remove whitespace Thanks for adding the test @offamitkumar. test/hotspot/jtreg/compiler/codecache/CodeCacheSegmentSizeTest.java line 38: > 36: public class CodeCacheSegmentSizeTest { > 37: public static void main(String[] args) throws Exception { > 38: String codeCacheSegmentSize = (Platform.isS390x() ? "67" : "36"); // invalid value (not power of two) I think we can check it with any platform (it is not a S390-only constraint). Maybe we could also do it twice, once with and once without a power-of-two size. What do you think? ------------- PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2930798652 PR Review Comment: https://git.openjdk.org/jdk/pull/25708#discussion_r2149151599 From epeter at openjdk.org Mon Jun 16 06:45:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:45:34 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v5] In-Reply-To: <7luvvsRIFw3U3rgZkbbmuL-o7gvsfkAbgGmPA2fy_5M=.0ae26d78-f811-4ea6-8cfc-10d3b1d513ab@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <7luvvsRIFw3U3rgZkbbmuL-o7gvsfkAbgGmPA2fy_5M=.0ae26d78-f811-4ea6-8cfc-10d3b1d513ab@github.com> Message-ID: On Fri, 13 Jun 2025 08:05:28 GMT, Hannes Greule wrote: >> This change improves the precision of the `Mod(I|L)Node::Value()` functions. >> >> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. >> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. >> >> ### Monotonicity >> >> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). >> >> ### Testing >> >> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). >> >> Please review and let me know what you think. >> >> ### Other >> >> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. >> >> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: >> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? >> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. > > Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Address more comments > - Merge branch 'master' into improve-mod-value > - Add randomized test > - Use BasicType for shared implementation > - Update ModL comment > - Use TOP instead of ZERO > - Apply suggested test changes > - adapt uabs -> g_uabs name change > - change range of mod by 0 for PhaseCCP > - Improve ModLNode::Value > - ... and 3 more: https://git.openjdk.org/jdk/compare/2453dbbb...77134c1a src/hotspot/share/opto/divnode.cpp line 1236: > 1234: // We must be modulo'ing 2 int constants. > 1235: // Check for min_jlong % '-1', result is defined to be '0' > 1236: // We don't need to check for min_jint % '-1' as its result is defined when using jlong. It seems both cases are "defined"... so it sounds a little strange when you say `... as its result is defined when using jlong.` Both are "defined", it would be nice if you said explicitly "how" they are defined. But wait... how does this work. We used to do the same trick above for `min_jint` when using `Jint`, correct? // We must be modulo'ing 2 float constants. // Check for min_jint % '-1', result is defined to be '0'. if( i1->get_con() == min_jint && i2->get_con() == -1 ) return TypeInt::ZERO; Is this case here really handling that? It doesn't look like it. Do we have tests for all these cases? src/hotspot/share/opto/divnode.cpp line 1244: > 1242: // The magnitude of the divisor is in range [1, 2^31] or [1, 2^63], depending on the BasicType. > 1243: // We know it isn't 0 as we handled that above. > 1244: // That means at least one value is nonzero, so its absolute value is bigger than zero. I'm actually struggling to follow this here. Can you define "magnitude" for the reader? Maybe there is some JVMS definition you can mention. And which "value" are you refering to, that is nonzero here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149135993 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149146826 From epeter at openjdk.org Mon Jun 16 06:45:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:45:34 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v5] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <7luvvsRIFw3U3rgZkbbmuL-o7gvsfkAbgGmPA2fy_5M=.0ae26d78-f811-4ea6-8cfc-10d3b1d513ab@github.com> Message-ID: On Mon, 16 Jun 2025 06:32:10 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Address more comments >> - Merge branch 'master' into improve-mod-value >> - Add randomized test >> - Use BasicType for shared implementation >> - Update ModL comment >> - Use TOP instead of ZERO >> - Apply suggested test changes >> - adapt uabs -> g_uabs name change >> - change range of mod by 0 for PhaseCCP >> - Improve ModLNode::Value >> - ... and 3 more: https://git.openjdk.org/jdk/compare/2453dbbb...77134c1a > > src/hotspot/share/opto/divnode.cpp line 1244: > >> 1242: // The magnitude of the divisor is in range [1, 2^31] or [1, 2^63], depending on the BasicType. >> 1243: // We know it isn't 0 as we handled that above. >> 1244: // That means at least one value is nonzero, so its absolute value is bigger than zero. > > I'm actually struggling to follow this here. Can you define "magnitude" for the reader? Maybe there is some JVMS definition you can mention. And which "value" are you refering to, that is nonzero here? Suggestion: // We checked that t2 is not the zero constant. Hence at least i2->_lo or i2->_hi must be non-zero, // and hence its its absoute value is bigger than zero. Hence, the magnitude of the divisor (i.e. the // largest absolute value for any value in i2) must be in the range [1, 2^31] or [1, 2^63], depending // on the BasicType. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149153165 From epeter at openjdk.org Mon Jun 16 06:59:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:59:35 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: <3BJWLK3FukQCp2FHGcyBDTZtbc5aS8VreNKYKAaQrdU=.43a7e821-8d56-4161-850a-9137d17d44de@github.com> On Wed, 28 May 2025 09:53:32 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update ModL comment >> - Use TOP instead of ZERO >> - Apply suggested test changes > > @SirYwell Thanks for looking into this, that looks promising! > > I have two bigger comments: > - Could we unify the L and I code, either using C++ templating or `BasicType`? It would reduce code duplication. > - Can we have some tests where the input ranges are random as well, and where we check the output ranges with some comparisons? > > ------------------ > Copied from the code comment: > >> Nice work with the examples you already have, and randomizing some of it! >> >> I would like to see one more generalized test. >> - compute `res = lhs % rhs` >> - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. >> - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. >> >> Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. >> >> I hope that makes sense :) >> This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. >> >> This is an example, where I asked someone to try this out as well: >> https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 > @eme64 I merged master and hopefully addressed your latest comments. Now that we have #17508 integrated, I could also directly update the unsigned variant, but I'm also fine with doing that separately. WDYT? > > I also checked the constant folding part again (or generally whenever the RHS is a constant), these code paths are indeed not used by PhaseGVN directly (but by PhaseCCP and PhaseIdealLoop). That makes it a bit difficult to test that part properly. Let's keep the patch as it is. With #17508 we will have to also probably refactor and add more tests, if we want to do any unsigned and known-bit optimizations. ---------------- @SirYwell Thanks for the updates, I had a few more comments, but we are getting there :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-2975322731 From epeter at openjdk.org Mon Jun 16 06:59:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:59:42 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v5] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <7luvvsRIFw3U3rgZkbbmuL-o7gvsfkAbgGmPA2fy_5M=.0ae26d78-f811-4ea6-8cfc-10d3b1d513ab@github.com> Message-ID: <5lraXEv3sIvIharQbnucsEZCGt9ZaFH9naNQ8ycTCGw=.eb3a4504-e763-4fc9-8277-4136f4f38974@github.com> On Mon, 16 Jun 2025 06:23:41 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Address more comments >> - Merge branch 'master' into improve-mod-value >> - Add randomized test >> - Use BasicType for shared implementation >> - Update ModL comment >> - Use TOP instead of ZERO >> - Apply suggested test changes >> - adapt uabs -> g_uabs name change >> - change range of mod by 0 for PhaseCCP >> - Improve ModLNode::Value >> - ... and 3 more: https://git.openjdk.org/jdk/compare/f8ff7d46...77134c1a > > src/hotspot/share/opto/divnode.cpp line 1236: > >> 1234: // We must be modulo'ing 2 int constants. >> 1235: // Check for min_jlong % '-1', result is defined to be '0' >> 1236: // We don't need to check for min_jint % '-1' as its result is defined when using jlong. > > It seems both cases are "defined"... so it sounds a little strange when you say `... as its result is defined when using jlong.` Both are "defined", it would be nice if you said explicitly "how" they are defined. > > But wait... how does this work. We used to do the same trick above for `min_jint` when using `Jint`, correct? > > > // We must be modulo'ing 2 float constants. > // Check for min_jint % '-1', result is defined to be '0'. > if( i1->get_con() == min_jint && i2->get_con() == -1 ) > return TypeInt::ZERO; > > > Is this case here really handling that? It doesn't look like it. > Do we have tests for all these cases? Hmm, seems we have discussed this before... Maybe it is best to just keep the old behavior and do the test for `min_jint` as well if we have `T_INT`. I'd rather be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149175279 From epeter at openjdk.org Mon Jun 16 06:59:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 06:59:42 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <5FnA_gZNzRom3MBShwfbdCffeRGogf1cyKo0nF40c4I=.9db6f973-e6a5-4852-b82e-24ccc198bcb9@github.com> Message-ID: On Mon, 2 Jun 2025 12:52:47 GMT, Hannes Greule wrote: >> @SirYwell I'm not 100% sure here, so please correct me if I'm wrong. >> You are now always passing in a `jlong` value, so you always use `static inline julong g_uabs(jlong n) { return g_uabs((julong)n); }`, even for `T_INT`. > > Yes that's correct, and it should still work due to how negation works for negative inputs. I think I was quite a bit confused here. Sounds good to me now though :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149178732 From thartmann at openjdk.org Mon Jun 16 07:00:35 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Jun 2025 07:00:35 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v3] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 08:36:14 GMT, Beno?t Maillard wrote: >> This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. >> >> Thanks! >> >> ### Testing >> >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) >> - [x] tier1-3, plus some internal testing >> - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check >> >> Shout out to @TobiHartmann for helping out with jtreg > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8357816: Split long line into several ones Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25774#pullrequestreview-2930845772 From kwei at openjdk.org Mon Jun 16 07:10:40 2025 From: kwei at openjdk.org (Kuai Wei) Date: Mon, 16 Jun 2025 07:10:40 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Wed, 11 Jun 2025 12:13:22 GMT, Emanuel Peter wrote: >> @eme64 From your description, I may change like below. Could you check if I understand correct? Thanks. >> When IGVN check the input combine operator, called it as `_checked`. We can go down the combine operators chain to find the _last one. >> >> for op from _last to _checked: // _checked is not include >> collect merge_info_list by op >> if it can be merged and _checked is in the list: >> return // it will be merged when IGVN optimize this op >> if can not merge or _checked is not in list: >> continue; >> // all successors of combine operators are checked, we can start to merge with _checked >> ... >> >> I think it can work but there are some redundant `collect and check` work. And we can add a cache in IGVN to reduce it. Do you have other suggestion about it ? > > @kuaiwei I'm struggling to follow your pseudocode, can you please expand a little or try to describe in other words? > > It is good to reduce redundant work, but it has to be worth the complexity. But I'm skeptical of caching things between IGVN optimizations, it is just not something we do, as far as I know. Caching could also be tricky when the cached things are not accurate any more. What exactly would be your approach with caching? @eme64 I added some detail to pseudo code. I hope it can explain my design. A difficult part of the optimization is that we need find the right combine operator which can be replaced with merged load. Especially it has successor combine operator. We need to find which one is better candidate. For a given combine operator, the `MergePrimitiveLoad::run()` will do like this: ```c++ void MergePrimitiveLoad::run(AddNode* _check) { // go down through the unique output of _check to collect successor combine operators GrowableArray rest_combine_operators = collect_successor_combine_operators(_check) if ( !rest_combine_operator.is_empty() ) { for (AddNode* op in rest_combine_operators) { // collect mergeable load info list from this combine operator merge_list = collect_merge_info_list(op) if (_check is in merge_list) { // we will not optimize _check, it will be merged when op is optimized return; } } } // all successor combine operators are checked, we can start to optimize the given _check operator ... } // For a given combine operator, collect mergeable load info list. Every item // in the result list is a tuple of (Load, combine, shift_value) // It's similar with MergePrimitive::collect_merge_list() in previous patch, more detail can be found in code. GrowableArray collect_merge_info_list(AddNode* combine) { // get the load from input of combine LoadNode* load = get_load_from_input(combine); if (load == nullptr) return; MemNode* mem = memory input of load; for every output of mem { if (output->isa_load()) { // make a merged info for this load node info = merge_load_info(output); } if ( !info->is_invalid() ) { append info into result list } } // check the merged bytes and if they are adjacent. ... return result list; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2969508990 From epeter at openjdk.org Mon Jun 16 07:10:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 07:10:42 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v3] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 08:36:14 GMT, Beno?t Maillard wrote: >> This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. >> >> Thanks! >> >> ### Testing >> >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) >> - [x] tier1-3, plus some internal testing >> - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check >> >> Shout out to @TobiHartmann for helping out with jtreg > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8357816: Split long line into several ones Looks good to me too :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25774#pullrequestreview-2930870734 From epeter at openjdk.org Mon Jun 16 07:10:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 07:10:41 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Fri, 13 Jun 2025 08:21:51 GMT, Kuai Wei wrote: >> @kuaiwei I'm struggling to follow your pseudocode, can you please expand a little or try to describe in other words? >> >> It is good to reduce redundant work, but it has to be worth the complexity. But I'm skeptical of caching things between IGVN optimizations, it is just not something we do, as far as I know. Caching could also be tricky when the cached things are not accurate any more. What exactly would be your approach with caching? > > @eme64 I added some detail to pseudo code. I hope it can explain my design. > A difficult part of the optimization is that we need find the right combine operator which can be replaced with merged load. Especially it has successor combine operator. We need to find which one is better candidate. > For a given combine operator, the `MergePrimitiveLoad::run()` will do like this: > > ```c++ > void MergePrimitiveLoad::run(AddNode* _check) { > // go down through the unique output of _check to collect successor combine operators > GrowableArray rest_combine_operators = collect_successor_combine_operators(_check) > > if ( !rest_combine_operator.is_empty() ) { > for (AddNode* op in rest_combine_operators) { > // collect mergeable load info list from this combine operator > merge_list = collect_merge_info_list(op) > if (_check is in merge_list) { > // we will not optimize _check, it will be merged when op is optimized > return; > } > } > } > > // all successor combine operators are checked, we can start to optimize the given _check operator > ... > } > > // For a given combine operator, collect mergeable load info list. Every item > // in the result list is a tuple of (Load, combine, shift_value) > // It's similar with MergePrimitive::collect_merge_list() in previous patch, more detail can be found in code. > > GrowableArray collect_merge_info_list(AddNode* combine) { > // get the load from input of combine > LoadNode* load = get_load_from_input(combine); > if (load == nullptr) return; > > MemNode* mem = memory input of load; > > for every output of mem { > if (output->isa_load()) { > // make a merged info for this load node > info = merge_load_info(output); > } > if ( !info->is_invalid() ) { > append info into result list > } > } > > // check the merged bytes and if they are adjacent. > ... > > return result list; > } @kuaiwei Thanks for the additional detail, that was helpful! To me, it seems that this test here is almost too rigorous, and hence it might be more expensive than necessary. And that is also what you were worried about, right? You were worried that some checks were done over and over again, and that is why you wanted to cache something. Correct? for (AddNode* op in rest_combine_operators) { // collect mergeable load info list from this combine operator merge_list = collect_merge_info_list(op) if (_check is in merge_list) { // we will not optimize _check, it will be merged when op is optimized return; } } Can you say why you need to do all the traversals you do in `collect_merge_info_list`? Why do you need to check all all other uses of the mem-input of the load there? I was wondering if it is not sufficient to just check if the structure on the other side of `op` has the correct shift and load address offset, so that it could be merged in with the merge list. Do you know what I mean? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2975347439 From duke at openjdk.org Mon Jun 16 07:23:37 2025 From: duke at openjdk.org (duke) Date: Mon, 16 Jun 2025 07:23:37 GMT Subject: RFR: 8357816: Add test from JDK-8350576 [v3] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 08:36:14 GMT, Beno?t Maillard wrote: >> This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. >> >> Thanks! >> >> ### Testing >> >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) >> - [x] tier1-3, plus some internal testing >> - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check >> >> Shout out to @TobiHartmann for helping out with jtreg > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8357816: Split long line into several ones @benoitmaillard Your change (at version 535bae67e413756d363ef7e22067941473b6e496) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25774#issuecomment-2975381134 From duke at openjdk.org Mon Jun 16 07:31:36 2025 From: duke at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 16 Jun 2025 07:31:36 GMT Subject: Integrated: 8357816: Add test from JDK-8350576 In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 09:08:24 GMT, Beno?t Maillard wrote: > This PR adds a jtreg test for [JDK-8350576](https://bugs.openjdk.org/browse/JDK-8350576). The test consists of a code sample produced by the fuzzer, and it contains a loop that is supposed to get optimized. > > Thanks! > > ### Testing > > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8357816) > - [x] tier1-3, plus some internal testing > - [x] Ran the test with a debug build prior to the fix (JDK 25 build 16) and made sure it failed as a sanity check > > Shout out to @TobiHartmann for helping out with jtreg This pull request has now been integrated. Changeset: d8c3533a Author: Beno?t Maillard Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d8c3533a91aa9c3a0b76846fe425c72bda9bd66c Stats: 56 lines in 1 file changed: 56 ins; 0 del; 0 mod 8357816: Add test from JDK-8350576 Co-authored-by: Tobias Hartmann Reviewed-by: syan, thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25774 From dfenacci at openjdk.org Mon Jun 16 08:11:05 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 16 Jun 2025 08:11:05 GMT Subject: [jdk25] RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 Message-ID: Hi all, This pull request contains a backport of commit [534a8605](https://github.com/openjdk/jdk/commit/534a8605e5f4d771be69426687b2188d5353c91e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Damon Fenacci on 16 Jun 2025 and was reviewed by Tobias Hartmann and Emanuel Peter. Thanks! ------------- Commit messages: - Backport 534a8605e5f4d771be69426687b2188d5353c91e Changes: https://git.openjdk.org/jdk/pull/25821/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25821&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358129 Stats: 11 lines in 2 files changed: 2 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25821.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25821/head:pull/25821 PR: https://git.openjdk.org/jdk/pull/25821 From sroy at openjdk.org Mon Jun 16 08:33:36 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 16 Jun 2025 08:33:36 GMT Subject: Integrated: JDK-8348574 : Simplify c1/c2_globals inclusions In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 08:32:56 GMT, Suchismith Roy wrote: > JBS Issue : [JDK-8348574](https://bugs.openjdk.org/browse/JDK-8348574) > > c1_globals.hpp includes c1_globals_pd.hpp. c1_globals_pd.hpp includes the corresponding CPU_HEADER and OS_HEADER files. All of the c1_globals_.hpp files are essentially identical and basically empty. (They just include globalDefinitions.hpp and macros.hpp, and provide nothing additional.) > > This could be simplified by having c1_globals.hpp do the CPU_HEADER inclusion directly, and remove c1_globals_pd.hpp and all c1_globals_.hpp files. > > Even if there are some non-vacuous c1_globals_.hpp files in the future, c1_globals_pd.hpp seems unwarranted; just add the OS_HEADER include directly in c1_globals.hpp. The c1_globals_pd.hpp files really don't seem worth the extra indirection. > > Similarly for c2_globals.hpp &etc. This pull request has now been integrated. Changeset: 79497ef7 Author: Suchismith Roy Committer: Varada M URL: https://git.openjdk.org/jdk/commit/79497ef7f55ef445b31348ae9d3d6dff6d3b6a54 Stats: 366 lines in 13 files changed: 2 ins; 360 del; 4 mod 8348574: Simplify c1/c2_globals inclusions Reviewed-by: mhaessig, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/25773 From shade at openjdk.org Mon Jun 16 08:43:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Jun 2025 08:43:28 GMT Subject: RFR: 8359435: AArch64: add support for 8.5 SB instruction In-Reply-To: References: Message-ID: <8ROhbgz6aKOAWffgFsdnZT619eoaea99p17R_fE9xxo=.df2714a8-04db-4355-9748-76ce70ca470c@github.com> On Sat, 14 Jun 2025 16:00:55 GMT, Andrew Haley wrote: > > So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it. > > Huh? The least confusing is when the SB support goes in the PR where it is used. That really is obvious, without any dependency chain. I am flexible to have it either way. One of the drawbacks of piling up the instruction support and the feature that uses these instructions: if there is _ever_ a second feature that depends on the same instruction support, we would effectively bind two commits (commit A: instruction support + feature A; commit B: feature B) together with an accidental dependency. Which gets extra funky if you ever go with bisects, backouts, backports. Atomic commits rule, and I personally strive to do them, even if there is a window when some code appears dead momentarily. But as I said, I would not quibble here. SB looks like something that we would solely use for spin-wait hints. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-2975611656 From amitkumar at openjdk.org Mon Jun 16 08:49:16 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Jun 2025 08:49:16 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v6] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: updates testcase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/24511b00..520e3512 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=04-05 Stats: 24 lines in 1 file changed: 20 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From amitkumar at openjdk.org Mon Jun 16 08:51:30 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Jun 2025 08:51:30 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v5] In-Reply-To: <1TQkKPqQfXPh6iDbnnrx3kWLwH2HbdXwmEiUiePMz8Q=.b0aaf45c-549d-4f3e-9993-6af7dc871620@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> <1TQkKPqQfXPh6iDbnnrx3kWLwH2HbdXwmEiUiePMz8Q=.b0aaf45c-549d-4f3e-9993-6af7dc871620@github.com> Message-ID: On Mon, 16 Jun 2025 06:35:57 GMT, Damon Fenacci wrote: > Maybe we could also do it twice, once with and once without a power-of-two size. What do you think? That was a great suggestion. updated; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25708#discussion_r2149393385 From mhaessig at openjdk.org Mon Jun 16 08:56:43 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 16 Jun 2025 08:56:43 GMT Subject: Withdrawn: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 07:19:40 GMT, Manuel H?ssig wrote: > # Issue Summary > > Running > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to `CompilationPolicy::initialize()` checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a new check introduced in #17244 to fail. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly,`NonNMethodHeapSize`is used as the maximum buffers size available for compilers buffers instead of `ReservedCodeCacheSize` if it was provided as a commandline flag. > > It might be debatable if this is the correct fix, since the `NonNMethodHeap` can spill into the other heaps if it is too small. However, I am of the opinion that if the `NonNMethodHeapSize` is explicitly specified, then the compiler count should be calculated accordingly. > > # Testing > > - [x] [GHA](https://github.com/mhaessig/jdk/actions/runs/15603409859) > - [x] tier1 and tier2 plus Oracle internal testing on our supported platforms > - [x] tier1 with a manually fixed core count of 288 (this reproduced the problem before the fix) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/25770 From shade at openjdk.org Mon Jun 16 09:22:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Jun 2025 09:22:31 GMT Subject: [jdk25] RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 07:30:37 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [534a8605](https://github.com/openjdk/jdk/commit/534a8605e5f4d771be69426687b2188d5353c91e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 16 Jun 2025 and was reviewed by Tobias Hartmann and Emanuel Peter. > > Thanks! Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25821#pullrequestreview-2931281968 From hgreule at openjdk.org Mon Jun 16 09:50:32 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 16 Jun 2025 09:50:32 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v5] In-Reply-To: <5lraXEv3sIvIharQbnucsEZCGt9ZaFH9naNQ8ycTCGw=.eb3a4504-e763-4fc9-8277-4136f4f38974@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <7luvvsRIFw3U3rgZkbbmuL-o7gvsfkAbgGmPA2fy_5M=.0ae26d78-f811-4ea6-8cfc-10d3b1d513ab@github.com> <5lraXEv3sIvIharQbnucsEZCGt9ZaFH9naNQ8ycTCGw=.eb3a4504-e763-4fc9-8277-4136f4f38974@github.com> Message-ID: <692KDUziVbOBILhkGodBa2hNKWLXUxiJvvViJjla4H0=.5caa5a7b-e49f-4dd6-a634-b90a5fdeae5d@github.com> On Mon, 16 Jun 2025 06:53:00 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/divnode.cpp line 1236: >> >>> 1234: // We must be modulo'ing 2 int constants. >>> 1235: // Check for min_jlong % '-1', result is defined to be '0' >>> 1236: // We don't need to check for min_jint % '-1' as its result is defined when using jlong. >> >> It seems both cases are "defined"... so it sounds a little strange when you say `... as its result is defined when using jlong.` Both are "defined", it would be nice if you said explicitly "how" they are defined. >> >> But wait... how does this work. We used to do the same trick above for `min_jint` when using `Jint`, correct? >> >> >> // We must be modulo'ing 2 float constants. >> // Check for min_jint % '-1', result is defined to be '0'. >> if( i1->get_con() == min_jint && i2->get_con() == -1 ) >> return TypeInt::ZERO; >> >> >> Is this case here really handling that? It doesn't look like it. >> Do we have tests for all these cases? > > Hmm, seems we have discussed this before... Maybe it is best to just keep the old behavior and do the test for `min_jint` as well if we have `T_INT`. I'd rather be safe. I can add `min_jint` as a special case again. But I just had a different idea, as `x % -1 == 0` for any `x`, I could also generalize the check and only test for `-1`. WDYT? >> src/hotspot/share/opto/divnode.cpp line 1244: >> >>> 1242: // The magnitude of the divisor is in range [1, 2^31] or [1, 2^63], depending on the BasicType. >>> 1243: // We know it isn't 0 as we handled that above. >>> 1244: // That means at least one value is nonzero, so its absolute value is bigger than zero. >> >> I'm actually struggling to follow this here. Can you define "magnitude" for the reader? Maybe there is some JVMS definition you can mention. And which "value" are you refering to, that is nonzero here? > > Suggestion: > > // We checked that t2 is not the zero constant. Hence at least i2->_lo or i2->_hi must be non-zero, > // and hence its its absoute value is bigger than zero. Hence, the magnitude of the divisor (i.e. the > // largest absolute value for any value in i2) must be in the range [1, 2^31] or [1, 2^63], depending > // on the BasicType. Magnitude is what the JVMS uses, that's why I used it. But I like your suggested wording, I'll adapt it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149519211 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2149522952 From epeter at openjdk.org Mon Jun 16 09:51:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 09:51:39 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 22:43:35 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > renaming I'm on my first pass through the code, just leaving a few first observations / comments :) src/hotspot/share/opto/callGenerator.cpp line 617: > 615: uint endoff = call->jvms()->endoff(); > 616: if (C->inlining_incrementally()) { > 617: assert(endoff == call->req(), ""); // assert in SafePointNode::grow_stack What exactly are you asserting here? And what is the comment for? src/hotspot/share/opto/callGenerator.cpp line 620: > 618: } else { > 619: if (call->req() > endoff) { > 620: assert(OptimizeReachabilityFences, ""); Can you please add a helpful assert message? src/hotspot/share/opto/callnode.cpp line 950: > 948: case CatchProjNode::catch_all_index: projs->catchall_catchproj = cpn; break; > 949: default: { > 950: assert(cpn->_con > 1, ""); // exception table; rethrow case Can we please turn this into a helpful assert message? src/hotspot/share/opto/callnode.hpp line 497: > 495: // Are we guaranteed that this node is a safepoint? Not true for leaf calls and > 496: // for some macro nodes whose expansion does not have a safepoint on the fast path. > 497: virtual bool guaranteed_safepoint() { return true; } I see you only copied it. It makes me a little nervous when we call the "default" case safe. Because when you add more cases, you just assume it is safe... and if it is not we first have to discover that through a bug. What do you think? src/hotspot/share/opto/compile.cpp line 2526: > 2524: print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2); > 2525: if (failing()) return; > 2526: } You will have to check the impact on compile time here. Running an extra round of loop opts might be significant. Do you really need to use a loop opts phase for this? Or would something like `process_for_post_loop_opts_igvn` work for it? src/hotspot/share/opto/compile.cpp line 3958: > 3956: Node* rf = C->reachability_fence(i); > 3957: Node* in = rf->in(1); > 3958: if (in->is_DecodeN()) { Why not: Suggestion: ReachabilityFence* rf = C->reachability_fence(i); DecodeNNode* dn = rf->in(1)->isa_DecodeN(); if (dn != nullptr) { src/hotspot/share/opto/compile.hpp line 381: > 379: GrowableArray _template_assertion_predicate_opaques; > 380: GrowableArray _expensive_nodes; // List of nodes that are expensive to compute and that we'd better not let the GVN freely common > 381: GrowableArray _reachability_fences; // List of reachability fences Why not: Suggestion: GrowableArray _reachability_fences; // List of all reachability fences src/hotspot/share/opto/compile.hpp line 741: > 739: void remove_reachability_fence(Node* n) { > 740: _reachability_fences.remove_if_existing(n); > 741: } You could also add the type `ReachabilityFenceNode*` here. ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-2931260265 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149442595 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149444306 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149464647 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149488305 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149503924 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149514942 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149518776 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149519545 From epeter at openjdk.org Mon Jun 16 09:51:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 09:51:40 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 09:24:08 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming > > src/hotspot/share/opto/callnode.cpp line 950: > >> 948: case CatchProjNode::catch_all_index: projs->catchall_catchproj = cpn; break; >> 949: default: { >> 950: assert(cpn->_con > 1, ""); // exception table; rethrow case > > Can we please turn this into a helpful assert message? Can you quickly comment why you changed this? > src/hotspot/share/opto/compile.cpp line 2526: > >> 2524: print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2); >> 2525: if (failing()) return; >> 2526: } > > You will have to check the impact on compile time here. Running an extra round of loop opts might be significant. > > Do you really need to use a loop opts phase for this? Or would something like `process_for_post_loop_opts_igvn` work for it? Might be helpful if you write in a comment if this eliminates all or just some of the reachability fences. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149481075 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149507017 From epeter at openjdk.org Mon Jun 16 09:51:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 09:51:40 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: <_yDpYorDH_2ox5RaGm_JdCk4uYbiUYanemuUGR2LCp4=.33c1414a-7c61-45bb-9632-dbff88711fde@github.com> On Mon, 16 Jun 2025 09:39:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/compile.cpp line 2526: >> >>> 2524: print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2); >>> 2525: if (failing()) return; >>> 2526: } >> >> You will have to check the impact on compile time here. Running an extra round of loop opts might be significant. >> >> Do you really need to use a loop opts phase for this? Or would something like `process_for_post_loop_opts_igvn` work for it? > > Might be helpful if you write in a comment if this eliminates all or just some of the reachability fences. Can we limit it to cases where we actually have reachability fences? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149509147 From epeter at openjdk.org Mon Jun 16 09:51:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Jun 2025 09:51:41 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 19:35:09 GMT, Vladimir Ivanov wrote: >> Right, nevermind about `DecodeNKlass` then. My question on heap loads still stands: do we actually get `reachabilityFence(someField)` from anywhere? > > Are you asking specifically about `ReachabilityFence -> DecodeN -> LoadN` shape? Yes, it's common, especially after inlining. @iwanowww Can you add a code comment why this is safe to look through the ReachabilityFence? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149517512 From bkilambi at openjdk.org Mon Jun 16 10:12:14 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 16 Jun 2025 10:12:14 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments and added a JTREG test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/70e88489..aa9e53e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=01-02 Stats: 679 lines in 10 files changed: 520 ins; 18 del; 141 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Mon Jun 16 10:12:15 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 16 Jun 2025 10:12:15 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 14:02:04 GMT, Hao Sun wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and added a JTREG test > > src/hotspot/cpu/aarch64/aarch64.ad line 889: > >> 887: ); >> 888: >> 889: // Class for vector register v18 > > nit: use upper case > > Suggestion: > > // Class for vector register V18 Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2149566439 From bkilambi at openjdk.org Mon Jun 16 10:12:15 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 16 Jun 2025 10:12:15 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 02:34:20 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and added a JTREG test > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 6729: > >> 6727: // --------------------------------SelectFromTwoVector ----------------------------- >> 6728: >> 6729: instruct vselect_from_two_vectors_SIFNeon(vReg dst, vReg_V17 src1, vReg_V18 src2, > > We have a similar rule for `VectorRearrange` such as `rearrange_HS_neon`. To unify, can we use the similar name style for this rule? > > Suggestion: > > instruct vselect_from_two_vectors_HS_neon(vReg dst, vReg_V17 src1, vReg_V18 src2, Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2149568922 From bkilambi at openjdk.org Mon Jun 16 10:12:15 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 16 Jun 2025 10:12:15 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 13:57:18 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 4955: >> >>> 4953: match(Set dst (SelectFromTwoVector (Binary index src1) src2)); >>> 4954: effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2); >>> 4955: format %{ "vselect_from_two_vectors_SIF $dst, $src1, $src2, $index\t# vector (4S/8S/2I/4I/2F/4F). KILL $tmp1, $tmp2" %} >> >> Be careful here. `select_from_two_vectors_SIFNeon` seems to alter `src1`, so you need a `USE_KILL` effect. > >> @theRealAph Thanks for the suggestion! makes sense to add USE_KILL for the src1 usage here. I am getting into some errors when I do that. I'll resolve them and get back soon. Thanks! > > Maybe that should be USE_DEF or TEMP_DEF. I have used another temp variable instead of modifying the source directly. Please review if it's acceptable. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2149567699 From bkilambi at openjdk.org Mon Jun 16 10:14:30 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 16 Jun 2025 10:14:30 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test test/hotspot/jtreg/compiler/vectorapi/TestSelectFromTwoVectorOp.java line 196: > 194: applyIfCPUFeatureAnd = {"asimd", "true", "sve2", "false"}, > 195: applyIf = {"MaxVectorSize", ">=32"}) > 196: @IR(counts = {IRNode.SELECT_FROM_TWO_VECTOR_VB, IRNode.VECTOR_SIZE_32, ">0"}, I know that we currently do not have aarch64 machines with vec_length > 16B and SVE2 enabled but for the sake of completeness I added these tests as well because if we ever do have a 32B or 64B machine with SVE2, we will be able to generate `SelectFromTwoVector` IR and the lowered SVE2 `tbl` instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2149575954 From mli at openjdk.org Mon Jun 16 10:23:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 16 Jun 2025 10:23:42 GMT Subject: RFR: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 [v7] In-Reply-To: References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Fri, 13 Jun 2025 06:52:35 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - review Zicond >> - comments > > Hi, I changed your test a bit and I see test failure on my linux-riscv64 platform (no Zicond). > > > diff --git a/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java b/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java > index 472c38e009a..77398e1d88b 100644 > --- a/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java > +++ b/test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java > @@ -98,11 +98,11 @@ public static int test_float_BoolTest_ge(float x, float y) { > // when neither is NaN, and x > y > // return 0 > // when neither is NaN, and x <= y > - return !(x <= y) ? 1 : 0; > + return !(x <= y) ? 10 : 20; > } > @DontCompile > public static int golden_float_BoolTest_ge(float x, float y) { > - return !(x <= y) ? 1 : 0; > + return !(x <= y) ? 10 : 20; > } > > @Test > @@ -113,11 +113,11 @@ public static int test_double_BoolTest_ge(double x, double y) { > // when neither is NaN, and x > y > // return 0 > // when neither is NaN, and x <= y > - return !(x <= y) ? 1 : 0; > + return !(x <= y) ? 10 : 20; > } > @DontCompile > public static int golden_double_BoolTest_ge(double x, double y) { > - return !(x <= y) ? 1 : 0; > + return !(x <= y) ? 10 : 20; > } > > @Run(test = {"test_float_BoolTest_ge", "test_double_BoolTest_ge"}) > > > $ make test TEST="test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison2.java" JTREG="TIMEOUT_FACTOR=8" > > > STDERR: > java.lang.RuntimeException: Not trigger BoolTest::ge: expected true, was false > at jdk.test.lib.Asserts.fail(Asserts.java:715) > at jdk.test.lib.Asserts.assertTrue(Asserts.java:545) > at compiler.c2.irTests.TestFPComparison2.main(TestFPComparison2.java:71) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1474) > > JavaTest Message: Test threw exception: java.lang.RuntimeException > JavaTest Message: shutting down test Thank you @RealFYang @feilongjiang for reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25696#issuecomment-2975973413 From mli at openjdk.org Mon Jun 16 10:23:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 16 Jun 2025 10:23:42 GMT Subject: Integrated: 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 In-Reply-To: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> References: <9TAfBWFOsPTUIJPaUX9xcPqVsXLI6lsQBvcomuVqcQI=.1eb3ec5a-59ac-44ce-a46b-b984dda4605e@github.com> Message-ID: On Mon, 9 Jun 2025 16:27:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Thanks! > > Currently, this issue is only reproducible with Dacapo sunflow. > I tried to construct a simpler jtreg test to reproduce the issue, but can not find a way to do it till now, this task is tracked by https://bugs.openjdk.org/browse/JDK-8359045. > > So, currently I can only verify the code by reviewing it. > Or maybe it's better to leave it until we find the test? This pull request has now been integrated. Changeset: 9d060574 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/9d060574e5dbd13e634f00d749d0108ceff1fae8 Stats: 1093 lines in 4 files changed: 1059 ins; 20 del; 14 mod 8358892: RISC-V: jvm crash when running dacapo sunflow after JDK-8352504 8359045: RISC-V: construct test to verify invocation of C2_MacroAssembler::enc_cmove_cmp_fp => BoolTest::ge/gt Co-authored-by: Fei Yang Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/25696 From dfenacci at openjdk.org Mon Jun 16 10:52:30 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 16 Jun 2025 10:52:30 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v6] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: <6QFg3WwVa3z27icUaAnNja5NJQYeArTU5uZn1vCgW44=.068ea9f8-08ca-4ab1-939f-ef73038d7205@github.com> On Mon, 16 Jun 2025 08:49:16 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > updates testcase Thanks for changing the test @offamitkumar. test/hotspot/jtreg/compiler/codecache/CodeCacheSegmentSizeTest.java line 2: > 1: /* > 2: * Copyright (c) 2024 IBM Corporation. All rights reserved. Suggestion: * Copyright (c) 2025 IBM Corporation. All rights reserved. test/hotspot/jtreg/compiler/codecache/CodeCacheSegmentSizeTest.java line 74: > 72: OutputAnalyzer output = new OutputAnalyzer(pb.start()); > 73: > 74: output.shouldContain("openjdk version"); // typical first line The test fails here for me (the output has the string "java version", not "openjdk version"). Maybe instead of checking this string we could check that there is no error? ------------- PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2931604110 PR Review Comment: https://git.openjdk.org/jdk/pull/25708#discussion_r2149650252 PR Review Comment: https://git.openjdk.org/jdk/pull/25708#discussion_r2149649320 From amitkumar at openjdk.org Mon Jun 16 11:37:29 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Jun 2025 11:37:29 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v6] In-Reply-To: <6QFg3WwVa3z27icUaAnNja5NJQYeArTU5uZn1vCgW44=.068ea9f8-08ca-4ab1-939f-ef73038d7205@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> <6QFg3WwVa3z27icUaAnNja5NJQYeArTU5uZn1vCgW44=.068ea9f8-08ca-4ab1-939f-ef73038d7205@github.com> Message-ID: On Mon, 16 Jun 2025 10:48:20 GMT, Damon Fenacci wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> updates testcase > > test/hotspot/jtreg/compiler/codecache/CodeCacheSegmentSizeTest.java line 74: > >> 72: OutputAnalyzer output = new OutputAnalyzer(pb.start()); >> 73: >> 74: output.shouldContain("openjdk version"); // typical first line > > The test fails here for me (the output has the string "java version", not "openjdk version"). > Maybe instead of checking this string we could check that there is no error? This was the string for me: amit at a3560042:~/jdk$ ./build/linux-s390x-server-fastdebug/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSize=64 -version openjdk version "26-internal" 2026-03-17 OpenJDK Runtime Environment (fastdebug build 26-internal-adhoc.amit.jdk) OpenJDK 64-Bit Server VM (fastdebug build 26-internal-adhoc.amit.jdk, mixed mode) amit at a3560042:~/jdk$ But I will update it and check for error only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25708#discussion_r2149725350 From amitkumar at openjdk.org Mon Jun 16 11:45:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Jun 2025 11:45:09 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v7] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: remove string check & update copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/520e3512..2db6f325 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From mhaessig at openjdk.org Mon Jun 16 13:39:35 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 16 Jun 2025 13:39:35 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v2] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 12:12:08 GMT, Roberto Casta?eda Lozano wrote: >> That makes a lot of things much clearer. I like it! Thank you for the suggestion! Incorporated in 3d6f897 > > Thanks, now that you have defined `lea_address` and `decode_address`, the logic that distinguishes between the spill/no spill cases can be further simplified by letting these pointers step over the spill node in the case of spilling: > > > if (is_spill) { > decode_address = decode_address->in(1); > lea_address = lea_address->in(1); > } > > > Then you can use them directly later on instead of conditionally using e.g. `decode_address` or `decode_address->in(1)` depending on the value of `is_spill`. Here a sketch of my suggested refactoring (not tested): https://github.com/openjdk/jdk/commit/2b75e85dbeedd380ab3ea02c64816c931b3dfe33. Feel free to apply it (or parts of it) if you agree that it makes the code more readable. That is quite an improvement! I applied it. Thank you for your detailed suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2150032824 From mhaessig at openjdk.org Mon Jun 16 14:30:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 16 Jun 2025 14:30:33 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: On Thu, 12 Jun 2025 12:46:44 GMT, Roberto Casta?eda Lozano wrote: >> Exactly. This is why I can get away with only checking the usages of the decode. > > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? This scenario looks like this: ![image](https://github.com/user-attachments/assets/0c012810-c158-46dc-997d-6b5040c16377) Test coming soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2150154593 From bkilambi at openjdk.org Mon Jun 16 14:59:32 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 16 Jun 2025 14:59:32 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test Hi @theRealAph @XiaohongGong @shqking I have addressed your review comments. Please do review the new patch. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23570#issuecomment-2976988254 From roland at openjdk.org Mon Jun 16 15:22:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 16 Jun 2025 15:22:36 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> Message-ID: On Fri, 13 Jun 2025 08:54:43 GMT, Roberto Casta?eda Lozano wrote: > > One way would be to simply assert that there's no `NarrowMemProj`s left during final graph reshape. Is that what you'd like? > > Yes, that would be great (and I think it is OK to leave it to a future RFE if fully enforcing it would further increase the complexity of this PR). Ok. I propose to file the follow up RFE if/once this PR has integrated as there's no follow up work until this code is in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2977066799 From kxu at openjdk.org Mon Jun 16 15:25:53 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Jun 2025 15:25:53 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v6] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into counted-loop-refactor # Conflicts: # src/hotspot/share/opto/loopnode.cpp # src/hotspot/share/opto/loopnode.hpp # src/hotspot/share/opto/loopopts.cpp - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - further refactor is_counted_loop() by extracting functions - WIP: refactor is_counted_loop() - WIP: refactor is_counted_loop() - WIP: review followups - reviewer suggested changes - line break - remove TODOs - Revert "improve formatting, naming, comments" This reverts commit fd6071761bdc47ab5695559dffd1e1dd6038d9f7. - ... and 12 more: https://git.openjdk.org/jdk/compare/9d060574...fd93998b ------------- Changes: https://git.openjdk.org/jdk/pull/24458/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=05 Stats: 924 lines in 3 files changed: 418 ins; 208 del; 298 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From roland at openjdk.org Mon Jun 16 15:26:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 16 Jun 2025 15:26:03 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v4] In-Reply-To: References: Message-ID: <4pxoG8BH9QOLHb9O7XqofwEcrOXqSSLIbbunGEE5UYg=.17621f96-6eaa-476e-8043-cba003880f2b@github.com> > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java Co-authored-by: Roberto Casta?eda Lozano - Update test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/2d1b1096..ae62f7d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From roland at openjdk.org Mon Jun 16 15:26:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 16 Jun 2025 15:26:03 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: <4qen78-Aon9foRJSduZFQ47DrCMPUjuMF7MQj_uk7jI=.e7b5e0b7-04f4-4d5a-a224-a1c78ff89c09@github.com> Message-ID: On Fri, 13 Jun 2025 08:01:52 GMT, Roberto Casta?eda Lozano wrote: >>> Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. >>> >>> Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. >> >> I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`. >> >> It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit. > >> > Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. >> > Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. >> >> I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the `OuterStripMinedLoop`. >> >> It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit. > > Thanks for the background, Roland! > > I think it would be worth exploring this, but I agree that there is a risk of silently affecting other loop optimizations. Luckily, the IR test framework gives us now a means to improve our confidence that changes in this area do not affect expected optimizations. Unfortunately, our current IR test coverage of loop optimizations is incomplete, so a pre-condition to exploring full SSA for strip-mined loops (and something worth doing in any case IMO) would be adding more IR tests checking that at least basic optimizations like peeling, unswitching, unrolling, range check elimination, etc. happen as expected. Thanks for the review @robcasloz ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2977070215 From yzheng at openjdk.org Mon Jun 16 16:12:31 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 16 Jun 2025 16:12:31 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 05:32:19 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Add assert for unexpected node in truncation Since the base is before the branch off, could you please merge master before integration and see if GHA still passes? This also helps us testing libgraal compilation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2977229836 From never at openjdk.org Mon Jun 16 17:04:16 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 16 Jun 2025 17:04:16 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: <36Lioxk5dWw3HwSMEy7ZXUjaUkm_3jNuD2zU2eULEi4=.62a4e77d-f18b-47a3-82f8-e2c735b65acb@github.com> On Sun, 15 Jun 2025 00:57:55 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/jvmci/jvmciRuntime.cpp line 818: >> >>> 816: HotSpotJVMCI::InstalledCode::set_address(jvmciEnv, nmethod_mirror, 0); >>> 817: HotSpotJVMCI::InstalledCode::set_entryPoint(jvmciEnv, nmethod_mirror, 0); >>> 818: HotSpotJVMCI::HotSpotNmethod::set_invalidationReason(jvmciEnv, nmethod_mirror, static_cast(invalidation_reason)); >> >> I think you're in danger of overwriting the invalidationReason here. `invalidate_nmethod_mirror` can be called for unloaded nmethods after they have been made not entrant. > > I see, thank you for the catch! I'm not sure how I can detect this case. My first idea was to just check if the "invalidationReason" field is already set or not, but looks like in order to do that I'd need to wrap the `nmethod_mirror` in a `JVMCIObject`. Unfortunately, there is a comment just a few lines above this change saying to _not_ do this wrapping. I'm not quite familiar with all nmethod state transitions. Would it be sufficient to just add a check for whether the nmethod is already non-entrant on line 818? You should be able to use `HotSpotJVMCI::HotSpotNmethod::get_invalidationReason`. JVMCIObject handles dispatching if you don't know which runtime you are attached to but we know we are talking to HotSpot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2150464134 From kxu at openjdk.org Mon Jun 16 17:36:29 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 16 Jun 2025 17:36:29 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 07:42:42 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> WIP: review followups > > Was out last week but I'm seeing your last commit mentions WIP. Let me know when it's ready to have another look again :-) Resolved conflict with [JDK-8357951](https://bugs.openjdk.org/browse/JDK-8357951). @chhagedorn I'd appreciate a re-review. Thank you so much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-2977462355 From duke at openjdk.org Mon Jun 16 19:24:34 2025 From: duke at openjdk.org (duke) Date: Mon, 16 Jun 2025 19:24:34 GMT Subject: Withdrawn: 8211759: C2: Graph after optimizations should not have dead nodes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 08:23:43 GMT, Zihao Lin wrote: > Move the check_no_dead_use() call after the final_graph_reshaping() call to catch dead nodes. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24175 From sparasa at openjdk.org Mon Jun 16 19:42:54 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Jun 2025 19:42:54 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v2] In-Reply-To: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: > The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 4 bytes. > > Without this fix, we see the following error for the C2 compiler tests below: > > compiler/vectorization/runner/ArrayTypeConvertTest.java > compiler/intrinsics/zip/TestFpRegsABI.java > > > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 > # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) > # > # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 > # > > > This PR fixes the errors in the above-mentioned tests. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: update max_size computation to be explicit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25787/files - new: https://git.openjdk.org/jdk/pull/25787/files/0eca30db..37dede6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25787&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25787&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25787.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25787/head:pull/25787 PR: https://git.openjdk.org/jdk/pull/25787 From sparasa at openjdk.org Mon Jun 16 19:42:55 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Jun 2025 19:42:55 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v2] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Fri, 13 Jun 2025 09:44:14 GMT, Aleksey Shipilev wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> update max_size computation to be explicit > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4691: > >> 4689: >> 4690: // Using the APX extended general purpose registers increases the instruction encoding size by 4 bytes. >> 4691: int max_size = dst->encoding() <= 15 ? 23 : 27; > > Do you want to write it in a more explicit way then? > > > int max_size = 23 + (UseAPX ? 4 : 0); Thank you for the suggestion! Please see suggested change incorporated in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25787#discussion_r2150729597 From sparasa at openjdk.org Mon Jun 16 22:05:42 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Jun 2025 22:05:42 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v5] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Tue, 23 May 2023 16:12:27 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > comments describe C2GeneralStub src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4339: > 4337: } > 4338: > 4339: auto stub = C2CodeStub::make(dst, src, slowpath_target, 23, convertF2I_slowpath); Hi @merykitty, could you please explain how the size 23 was computed? This value does not work with APX and I created a PR (https://github.com/openjdk/jdk/pull/25787) for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13602#discussion_r2150975884 From cslucas at openjdk.org Mon Jun 16 22:22:48 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 16 Jun 2025 22:22:48 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: <36Lioxk5dWw3HwSMEy7ZXUjaUkm_3jNuD2zU2eULEi4=.62a4e77d-f18b-47a3-82f8-e2c735b65acb@github.com> References: <36Lioxk5dWw3HwSMEy7ZXUjaUkm_3jNuD2zU2eULEi4=.62a4e77d-f18b-47a3-82f8-e2c735b65acb@github.com> Message-ID: On Mon, 16 Jun 2025 17:00:49 GMT, Tom Rodriguez wrote: >> I see, thank you for the catch! I'm not sure how I can detect this case. My first idea was to just check if the "invalidationReason" field is already set or not, but looks like in order to do that I'd need to wrap the `nmethod_mirror` in a `JVMCIObject`. Unfortunately, there is a comment just a few lines above this change saying to _not_ do this wrapping. I'm not quite familiar with all nmethod state transitions. Would it be sufficient to just add a check for whether the nmethod is already non-entrant on line 818? > > You should be able to use `HotSpotJVMCI::HotSpotNmethod::get_invalidationReason`. JVMCIObject handles dispatching if you don't know which runtime you are attached to but we know we are talking to HotSpot. Thank you for clarifying. I just pushed a change addressing your comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2150991011 From cslucas at openjdk.org Mon Jun 16 22:22:48 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 16 Jun 2025 22:22:48 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v7] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Prevent overriding invalidation reason. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/6f6d129b..c0e970f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=05-06 Stats: 28 lines in 5 files changed: 20 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From cslucas at openjdk.org Tue Jun 17 00:39:54 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 17 Jun 2025 00:39:54 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v8] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Remove extra space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/c0e970f1..68271c7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From duke at openjdk.org Tue Jun 17 01:40:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 17 Jun 2025 01:40:39 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: Message-ID: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> On Tue, 3 Jun 2025 12:06:31 GMT, Andrew Haley wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test copywrite > > src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 89: > >> 87: x = trampoline; >> 88: } >> 89: call->set_destination(x); > > I think I see what you're doing here, but it doesn't look right. At the very least it's a trap for maintainers, who don't expect the destination address to be discarded if the call doesn't reach. > > When the call doesn't reach, I believe you're fixing up an internal call to point to its target in the new copy of the code. But this isn't right when calls are PC relative, is it? In that case it makes more sense to leave the call instruction alone rather than rewrite it. This code specifically is for calls that were close enough to have an immediate offset before relocation but are now too far and require a trampoline. I agree this is probably not the best location for it and it should be moved up in the call stack (probably in `CallRelocation::fix_relocation_after_move`). The code that fixes up the internal calls is [here](https://github.com/chadrako/jdk/blob/4e80e35959829ecf1579efc65b9525b2aff2be1f/src/hotspot/share/code/relocInfo.cpp#L380-L385). The issue is that the calls are PC relative but the logic in `CallRelocation::fix_relocation_after_move` changes the offset to point to the old nmethod?s code assuming it is an external routine. So it is incorrectly updating the offset when it needs to stay the same. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2151154202 From kvn at openjdk.org Tue Jun 17 02:41:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 02:41:44 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string Message-ID: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. I also noticed that we missed similar check after Compile_lock when we are storing AOT code. Tested hs-tier1-6,hs-tier10-rt,stress,xcomp ------------- Commit messages: - 8359646: C1 crash in AOTCodeAddressTable::add_C_string Changes: https://git.openjdk.org/jdk/pull/25841/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25841&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359646 Stats: 15 lines in 1 file changed: 9 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25841.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25841/head:pull/25841 PR: https://git.openjdk.org/jdk/pull/25841 From kvn at openjdk.org Tue Jun 17 02:41:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 02:41:44 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 02:34:24 GMT, Vladimir Kozlov wrote: > It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. > > Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. > > I also noticed that we missed similar check after Compile_lock when we are storing AOT code. > > Tested hs-tier1-6,hs-tier10-rt,stress,xcomp @adinn and @ashu-mehra please look. Very rare case because very narrow window for concurrency. This happens during assembly phase. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2978737640 PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2978738150 From xgong at openjdk.org Tue Jun 17 03:04:31 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 17 Jun 2025 03:04:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test src/hotspot/cpu/aarch64/aarch64.ad line 4990: > 4988: %{ > 4989: constraint(ALLOC_IN_RC(v17_veca_reg)); > 4990: match(vReg); Not sure whether it's better to use `match(VecA)` here. src/hotspot/cpu/aarch64/aarch64_vector.ad line 7176: > 7174: vReg index, vReg tmp1) %{ > 7175: predicate((Matcher::vector_element_basic_type(n) == T_SHORT || > 7176: type2aelembytes(Matcher::vector_element_basic_type(n)) == 4) && To use the same basic type check condition, can we use `type2aelembytes(Matcher::vector_element_basic_type(n)) == 2` instead of `Matcher::vector_element_basic_type(n) == T_SHORT` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151208483 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151217232 From xgong at openjdk.org Tue Jun 17 03:04:31 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 17 Jun 2025 03:04:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 02:48:54 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and added a JTREG test > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 7176: > >> 7174: vReg index, vReg tmp1) %{ >> 7175: predicate((Matcher::vector_element_basic_type(n) == T_SHORT || >> 7176: type2aelembytes(Matcher::vector_element_basic_type(n)) == 4) && > > To use the same basic type check condition, can we use `type2aelembytes(Matcher::vector_element_basic_type(n)) == 2` instead of `Matcher::vector_element_basic_type(n) == T_SHORT` here? How about just using the negate condition of rule `vselect_from_two_vectors` ? predicate(Matcher::vector_element_basic_type(n) != T_BYTE && (UseSVE < 2 || Matcher::vector_length_in_bytes(n) < 16)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151227373 From xgong at openjdk.org Tue Jun 17 03:14:32 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 17 Jun 2025 03:14:32 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test src/hotspot/cpu/aarch64/aarch64_vector.ad line 251: > 249: // false if vector length > 16B but supported SVE version < 2. > 250: // For vector length of 16B, generate SVE2 "tbl" instruction if SVE2 is supported, else > 251: // generate Neon "tbl" instruction to select from two vectors. So for <16B vectors, it will generate the NEON version even if UseSVE == 2, right? Since the implementation is complex for NEON's non-byte types, can we consider using the SVE2 version for such cases? Or did you compare the performance between different implementations for 64-bit species vectors? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2861: > 2859: FloatRegister tmp1, BasicType bt, bool isQ) { > 2860: > 2861: assert_different_registers(dst, src1, src2, tmp1); Missing the `index` register? src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 191: > 189: FloatRegister one, FloatRegister vtmp, PRegister pgtmp, SIMD_RegVariant T); > 190: > 191: void verify_int_in_range(uint idx, const TypeInt* t, Register val, Register tmp); Please revert this change. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151232101 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151233024 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151234621 From dlong at openjdk.org Tue Jun 17 03:44:42 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 17 Jun 2025 03:44:42 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> References: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> Message-ID: On Tue, 17 Jun 2025 01:37:22 GMT, Chad Rakoczy wrote: >> src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 89: >> >>> 87: x = trampoline; >>> 88: } >>> 89: call->set_destination(x); >> >> I think I see what you're doing here, but it doesn't look right. At the very least it's a trap for maintainers, who don't expect the destination address to be discarded if the call doesn't reach. >> >> When the call doesn't reach, I believe you're fixing up an internal call to point to its target in the new copy of the code. But this isn't right when calls are PC relative, is it? In that case it makes more sense to leave the call instruction alone rather than rewrite it. > > This code specifically is for calls that were close enough to have an immediate offset before relocation but are now too far and require a trampoline. I agree this is probably not the best location for it and it should be moved up in the call stack (probably in `CallRelocation::fix_relocation_after_move`). > > The code that fixes up the internal calls is [here](https://github.com/chadrako/jdk/blob/4e80e35959829ecf1579efc65b9525b2aff2be1f/src/hotspot/share/code/relocInfo.cpp#L380-L385). The issue is that the calls are PC relative but the logic in `CallRelocation::fix_relocation_after_move` changes the offset to point to the old nmethod?s code assuming it is an external routine. So it is incorrectly updating the offset when it needs to stay the same. Isn't this logic only required because of Graal (JDK-8358096)? For HotSpot, there should always be a trampoline if one is needed [1], even taking into account possible nmethod relocation, because MacroAssembler::trampoline_call() uses is_always_within_branch_range(), which makes its decision based only on the codecache size, boundaries, and the target, not the address of the call site. Further, my understanding is that the Graal vs HotSpot issue is because Graal uses the A64 ISA "hard" limit reachability limit, while HotSpot uses a "soft" limit thanks to this constant with a lower value for debug: static const uint64_t branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M); [1] By "needed" here, I mean the hard A64 ISA limit. It might be useful to have a version of reachable_from_branch_at() that used the hard limit when working with JVMCI-generate code, because ca > So it is incorrectly updating the offset when it needs to stay the same. I don't understand what exactly is going with that, but after working on JDK-8321509, I wouldn't be surprised if there were assumptions that break when trying relocate, and you indeed might need to move the logic up a level, or even take into account what type of relocation it is, or the fact that the source is a finalized nmethod and not a BufferBlob. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2151256939 From sparasa at openjdk.org Tue Jun 17 04:07:14 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 17 Jun 2025 04:07:14 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: > The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. > > Without this fix, we see the following error for the C2 compiler tests below: > > compiler/vectorization/runner/ArrayTypeConvertTest.java > compiler/intrinsics/zip/TestFpRegsABI.java > > > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 > # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) > # > # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 > # > > > This PR fixes the errors in the above-mentioned tests. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix the change in the stub size by 1 byte ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25787/files - new: https://git.openjdk.org/jdk/pull/25787/files/37dede6e..4fe56be6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25787&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25787&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25787.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25787/head:pull/25787 PR: https://git.openjdk.org/jdk/pull/25787 From dlong at openjdk.org Tue Jun 17 04:10:38 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 17 Jun 2025 04:10:38 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> Message-ID: On Tue, 17 Jun 2025 03:42:02 GMT, Dean Long wrote: >> This code specifically is for calls that were close enough to have an immediate offset before relocation but are now too far and require a trampoline. I agree this is probably not the best location for it and it should be moved up in the call stack (probably in `CallRelocation::fix_relocation_after_move`). >> >> The code that fixes up the internal calls is [here](https://github.com/chadrako/jdk/blob/4e80e35959829ecf1579efc65b9525b2aff2be1f/src/hotspot/share/code/relocInfo.cpp#L380-L385). The issue is that the calls are PC relative but the logic in `CallRelocation::fix_relocation_after_move` changes the offset to point to the old nmethod?s code assuming it is an external routine. So it is incorrectly updating the offset when it needs to stay the same. > > Isn't this logic only required because of Graal (JDK-8358096)? For HotSpot, there should always be a trampoline if one is needed [1], even taking into account possible nmethod relocation, because MacroAssembler::trampoline_call() uses is_always_within_branch_range(), which makes its decision based only on the codecache size, boundaries, and the target, not the address of the call site. > Further, my understanding is that the Graal vs HotSpot issue is because Graal uses the A64 ISA "hard" limit reachability limit, while HotSpot uses a "soft" limit thanks to this constant with a lower value for debug: > > static const uint64_t branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M); > > [1] By "needed" here, I mean the hard A64 ISA limit. It might be useful to have a version of reachable_from_branch_at() that used the hard limit when working with JVMCI-generate code, because ca > >> So it is incorrectly updating the offset when it needs to stay the same. > > I don't understand what exactly is going with that, but after working on JDK-8321509, I wouldn't be surprised if there were assumptions that break when trying relocate, and you indeed might need to move the logic up a level, or even take into account what type of relocation it is, or the fact that the source is a finalized nmethod and not a BufferBlob. I took another look, and I think the problem with CallRelocation::fix_relocation_after_move() is the ambiguity of "destination" for a trampoline call. There is the near instruction destination inside the same nmethod, and there is the effective final far destination contained in the trampoline stub. For nmethod to nmethod relocation, I think you want a customized version of this function instead of trying to use it as-is. The customized version could continue to use the near destination, but fixed up with new_addr_for, or maybe simpler would be to use the final far destination, which is always going to be outside the nmethod if you clear inline caches first, which I think you would need to do, otherwise recursive self calls to the Java method would not get relocated correctly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2151276212 From sparasa at openjdk.org Tue Jun 17 04:26:31 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 17 Jun 2025 04:26:31 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Tue, 17 Jun 2025 04:07:14 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. >> >> Without this fix, we see the following error for the C2 compiler tests below: >> >> compiler/vectorization/runner/ArrayTypeConvertTest.java >> compiler/intrinsics/zip/TestFpRegsABI.java >> >> >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 >> # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) >> # >> # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 >> # >> >> >> This PR fixes the errors in the above-mentioned tests. >> >> Currently, the ConvertF2I macro works as follows: >> >> >> vcvttss2si %xmm1,%eax >> cmp $0x80000000,%eax >> je STUB >> CONTINUE: >> >> STUB: >> sub $0x8,%rsp >> vmovss %xmm1,(%rsp) >> call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)} >> pop %rax >> jmp CONTINUE >> >> >> The maximum size (max_size) of the stub is precomputed as 23. However, as seen in the convertF2I_slowpath implementation (below), the usage of pop(dst) instruction increases the instruction encoding size by 1 byte if dst is an extended general-purpose register (R16-R31) . >> >> For example, `pop (r15)` is encoded as `41 5f`, whereas `pop(r21)` is encoded as `d5 10 5d`. >> >> >> >> >> static void convertF2I_slowpath(C2_MacroAssembler& masm, C2GeneralStub& stub) { >> #define __ masm. >> Register dst = stub.data<0>(); >> XMMRegister src = stub.data<1>(); >> address target = stub.data<2>(); >> __ bind(stub.entry()); >> __ subptr(rsp, 8); >> __ mo... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix the change in the stub size by 1 byte Please see the updated version which fixes the error in the initial PR. While it's true that the size of the convert with truncation and the compare instruction in the main common path, it does not contribute to the increase in the stub size. As explained in the updated PR description, the increase in size by 1 byte is being caused by `pop(dst)` instruction in the stub. Hello Jatin (@jatin-bhateja), Please see the updated PR description which fixed the error in the initial PR. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/25787#issuecomment-2978885267 PR Comment: https://git.openjdk.org/jdk/pull/25787#issuecomment-2978888298 From amitkumar at openjdk.org Tue Jun 17 05:40:14 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Jun 2025 05:40:14 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: References: Message-ID: > There isn't enough initial cache present which can let the interpreter mode run freely. So before even we reach to the compiler phase and try to bail out, in case there isn't enough space left for the stub compilation, JVM crashes. Idea is to increase the Initial cache size and make it enough to run interpreter mode at least. Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into testfix - take the platform change out of loop - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25741/files - new: https://git.openjdk.org/jdk/pull/25741/files/bcf22368..12b60494 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25741&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25741&range=00-01 Stats: 16395 lines in 319 files changed: 11344 ins; 3707 del; 1344 mod Patch: https://git.openjdk.org/jdk/pull/25741.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25741/head:pull/25741 PR: https://git.openjdk.org/jdk/pull/25741 From amitkumar at openjdk.org Tue Jun 17 05:40:14 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Jun 2025 05:40:14 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 12:49:41 GMT, Damon Fenacci wrote: > Thanks @offamitkumar. The idea behind the [PR](https://github.com/openjdk/jdk/pull/23630) that changed this is that it would check randomly around the amount of code cache that would be just enough for the compilers to start (or not). So, before that PR it would sometimes crash instead of terminating gently. Does adding `800k` to the initial code cache for s390 do that? Did you try before that [PR](https://github.com/openjdk/jdk/pull/23630) (or temporarily reverting it) to see if it crashes? Just for my understanding. Even if test passes we still want to see this warning: [warning][codecache] CodeCache is full. Compiler has been disabled. Before the PR, I don't test crashing or even producing this warning. Even with my changes same behaviour is going on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25741#issuecomment-2965282688 From amitkumar at openjdk.org Tue Jun 17 05:40:14 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Jun 2025 05:40:14 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: References: Message-ID: <6d3ozqsSN87qzIzvY01J5IjKlM2mV1EYUDyxzIUAReo=.1b189622-3989-4f29-b3a1-542c9c95d09d@github.com> On Thu, 12 Jun 2025 06:29:26 GMT, Amit Kumar wrote: > Thanks @offamitkumar. The idea behind the [PR](https://github.com/openjdk/jdk/pull/23630) that changed this is that it would check randomly around the amount of code cache that would be just enough for the compilers to start (or not). So, before that PR it would sometimes crash instead of terminating gently. Does adding `800k` to the initial code cache for s390 do that? Did you try before that [PR](https://github.com/openjdk/jdk/pull/23630) (or temporarily reverting it) to see if it crashes? I tried to see the output from all thread, and modified the test a bit to verify that: java.lang.RuntimeException: 'CodeCache is full' found in stdout at jdk.test.lib.process.OutputAnalyzer.stdoutShouldNotContain(OutputAnalyzer.java:337) at compiler.startup.StartupOutput.main(StartupOutput.java:76) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1474) Above crash I got, once i modified the test case and then ran it: diff --git a/test/hotspot/jtreg/compiler/startup/StartupOutput.java b/test/hotspot/jtreg/compiler/startup/StartupOutput.java index 68cfaece2a5..02f3437c27b 100644 --- a/test/hotspot/jtreg/compiler/startup/StartupOutput.java +++ b/test/hotspot/jtreg/compiler/startup/StartupOutput.java @@ -73,7 +73,7 @@ public static void main(String[] args) throws Exception { for (int i = 0; i < 200; i++) { out = new OutputAnalyzer(pr[i]); // The VM should not crash but will probably fail with a "CodeCache is full. Compiler has been disabled." message - out.stdoutShouldNotContain("# A fatal error"); + out.stdoutShouldNotContain("CodeCache is full"); exitCode = out.getExitValue(); if (exitCode != 1 && exitCode != 0) { throw new Exception("VM crashed with exit code " + exitCode); So I think compiler is bailing out with current changes. Please let me know if this is incorrect or any other question you have. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25741#issuecomment-2978978871 From epeter at openjdk.org Tue Jun 17 07:17:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Jun 2025 07:17:35 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v4] In-Reply-To: <4pxoG8BH9QOLHb9O7XqofwEcrOXqSSLIbbunGEE5UYg=.17621f96-6eaa-476e-8043-cba003880f2b@github.com> References: <4pxoG8BH9QOLHb9O7XqofwEcrOXqSSLIbbunGEE5UYg=.17621f96-6eaa-476e-8043-cba003880f2b@github.com> Message-ID: <46PnJrx9oCCkTFd9tICMrsEXm_wb3dMaGQKT-xSv8Ec=.1046ecc3-9e7e-4fc2-a0bd-da2363dfed22@github.com> On Mon, 16 Jun 2025 15:26:03 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java > > Co-authored-by: Roberto Casta?eda Lozano > - Update test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java > > Co-authored-by: Roberto Casta?eda Lozano @rwestrel Thanks for the work here :) As mentioned previously, it feels like we have allowed a violation of C2 IR assumptions, namely that there is always a Phi if we mutate the memory state. And now we need to clean up things after that violation. Not great, but I understand why we got here: for simplicity of the outer strip mined loop, and not affecting other optimizations. Like I ask in one of my code comments below: Is there any place where we describe why we do not have Phis at the outer loop, and why we think that should be ok? It would be good to have those assumptions documented. And then you can refer to the method `OuterStripMinedLoopNode::handle_sunk_stores_at_expansion`, where we have to clean things up, and from there also back to the main description. I see you have some minimal comments at `PhaseIdealLoop::create_outer_strip_mined_loop`: // to the loop head. The inner strip mined loop is left as it is. Only // once loop optimizations are over, do we adjust the inner loop exit // condition to limit its number of iterations, set the outer loop // exit condition and add Phis to the outer loop head. I think we should add some more info there, and link to and from `OuterStripMinedLoopNode::handle_sunk_stores_at_expansion`. Additionally / alternatively, you could also comment directly at the `OuterStripMinedLoopNode` class. I just would like to prevent the situation where you are the only person who is able to understand how the outer strip mined loop works ;) src/hotspot/share/opto/loopnode.cpp line 2876: > 2874: PhaseIdealLoop* iloop) const { > 2875: CountedLoopNode* inner_cl = inner_counted_loop(); > 2876: Node* cle_out = inner_loop_exit(); Suggestion: IfFalseNode* cle_out = inner_loop_exit(); Optional :) src/hotspot/share/opto/loopnode.cpp line 2992: > 2990: } > 2991: > 2992: // Sunk stores should be referenced from an outer loop memory Phi I think you really need to give some longer explanation here why we need to do what you do here. Also: is there anywhere a description why we do not have the phi already by default for outer loops? Because I think we should really describe that somewhere, and state our assumptions. And then you could also refer to that description from here, and from there to here. src/hotspot/share/opto/loopnode.cpp line 2993: > 2991: > 2992: // Sunk stores should be referenced from an outer loop memory Phi > 2993: void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) { What does the word "expansion" refer to? Could you also mention that in your code comment above, please? src/hotspot/share/opto/loopnode.cpp line 2994: > 2992: // Sunk stores should be referenced from an outer loop memory Phi > 2993: void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) { > 2994: Node* cle_exit_proj = inner_loop_exit(); Suggestion: IfFalseNode* cle_exit_proj = inner_loop_exit(); Optional src/hotspot/share/opto/loopnode.cpp line 2996: > 2994: Node* cle_exit_proj = inner_loop_exit(); > 2995: > 2996: // Sunk stores are pinned on the loop exit projection of the inner loop Can you add a description why? src/hotspot/share/opto/loopnode.cpp line 3007: > 3005: #endif > 3006: > 3007: // Sunk stores are reachable from the memory state of the outer loop safepoint Is it true that the control of the safepoint is the `cle_exit_proj`? Could we add an assert for that? So we are just looking for all memory between those two control nodes? Or is it more complicated? src/hotspot/share/opto/loopnode.cpp line 3008: > 3006: > 3007: // Sunk stores are reachable from the memory state of the outer loop safepoint > 3008: Node* safepoint = outer_safepoint(); Suggestion: SafePointNode* safepoint = outer_safepoint(); Optional src/hotspot/share/opto/loopnode.cpp line 3014: > 3012: return; > 3013: } > 3014: MergeMemNode* mm = safepoint_mem->as_MergeMem(); You don't use `safepoint_mem` for anything else than checking if it is a `MergeMem`. So why not do this: Suggestion: MergeMemNode* mm = safepoint->in(TypeFunc::Memory)->isa_MergeMem; if (mm == nullptr) { // There is no MergeMem, which should only happen if there was no memory node // sunk out of the loop. assert(stores_in_outer_loop_cnt == 0, "inconsistent"); return; } ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2934351620 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151459471 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151462464 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151487902 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151459759 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151463314 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151465652 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151466480 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151472016 From epeter at openjdk.org Tue Jun 17 07:17:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Jun 2025 07:17:36 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v4] In-Reply-To: <46PnJrx9oCCkTFd9tICMrsEXm_wb3dMaGQKT-xSv8Ec=.1046ecc3-9e7e-4fc2-a0bd-da2363dfed22@github.com> References: <4pxoG8BH9QOLHb9O7XqofwEcrOXqSSLIbbunGEE5UYg=.17621f96-6eaa-476e-8043-cba003880f2b@github.com> <46PnJrx9oCCkTFd9tICMrsEXm_wb3dMaGQKT-xSv8Ec=.1046ecc3-9e7e-4fc2-a0bd-da2363dfed22@github.com> Message-ID: <_vOrMUGCh3sKq7Xp1oSKGIGJ1JjtXAUWZER2Kt0xstI=.370fcf7f-0001-44d6-a80f-961cc03aa196@github.com> On Tue, 17 Jun 2025 06:57:13 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/loopnode.cpp line 3014: > >> 3012: return; >> 3013: } >> 3014: MergeMemNode* mm = safepoint_mem->as_MergeMem(); > > You don't use `safepoint_mem` for anything else than checking if it is a `MergeMem`. So why not do this: > Suggestion: > > MergeMemNode* mm = safepoint->in(TypeFunc::Memory)->isa_MergeMem; > if (mm == nullptr) { > // There is no MergeMem, which should only happen if there was no memory node > // sunk out of the loop. > assert(stores_in_outer_loop_cnt == 0, "inconsistent"); > return; > } I also added a comment for this exit condition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2151472808 From jbhateja at openjdk.org Tue Jun 17 07:19:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Jun 2025 07:19:38 GMT Subject: Integrated: 8351645: C2: Assertion failures in Expand/CompressBits idealizations with TOP In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:53:23 GMT, Jatin Bhateja wrote: > Bugfix patch adds missing safe type access checks in Expand/Compress Ideal transforms. Problem occues during IGVN cleanups after partial peeling of loop. > > Test mentioned in the bug report has been included along with the patch. > > Kindly review. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: ff75f763 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/ff75f763c0a91534ab593a43e2ace741d05b0ccb Stats: 133 lines in 2 files changed: 129 ins; 0 del; 4 mod 8351645: C2: Assertion failures in Expand/CompressBits idealizations with TOP Co-authored-by: Emanuel Peter Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25586 From mhaessig at openjdk.org Tue Jun 17 08:03:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 08:03:08 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce Message-ID: # Issue Summary Running java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a check updated in #17244 to fail that should ensure that `NonNMethodCodeHeapSize` is at least `CodeCacheMinimumUseSpace` plus the size of all `CICompilerCount` compiler buffers. This is stronger than before #17244 where this check merely required `NonNMethodCodeHeapSize >= CodeCacheMinimumUseSpace` due to the fact that compiler buffers can also use the rest of the code cache if the non-nmethod heap is not sufficient. # Change Rationale This PR reverts the failing check to ensuring `NonNMethodCodeHeapSize >= CodeCacheMinimumUseSpace` since the computation of the ergonomic `CICompilerCount` in `CompilationPolicy::initialize()` does not support the assumption that all compiler buffers must always fit inside the non-nmethod code heap. This change required to adjust a test, because with the weaker check, it is currently not possible to trigger it from the commandline. # Testing - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15681197877) - [x] tier1 and tier2 plus some Oracle internal testing ------------- Commit messages: - Fix test for new semantics - Set minimum size for non-nmethod code heap to CodeCacheMinimumUseSize Changes: https://git.openjdk.org/jdk/pull/25830/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25830&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354727 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25830/head:pull/25830 PR: https://git.openjdk.org/jdk/pull/25830 From adinn at openjdk.org Tue Jun 17 08:13:33 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Jun 2025 08:13:33 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 02:34:24 GMT, Vladimir Kozlov wrote: > It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. > > Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. > > I also noticed that we missed similar check after Compile_lock when we are storing AOT code. > > Tested hs-tier1-6,hs-tier10-rt,stress,xcomp Looks good src/hotspot/share/code/aotCodeCache.cpp line 823: > 821: // and the main thread generating adapter > 822: MutexLocker ml(Compile_lock); > 823: if (!is_on()) { Just for the record: I can see why this is needed to stop compiler threads dereferencing a null cache pointer or null table pointer when some other thread might concurrently be closing the cache. That led me to wonder why we don't need to further synchronize concurrent execution of non-compiler threads. I convinced myself that whenever a non-compiler thread calls `AOTCodeCache::close()` the only other running threads that might try to access the AOT cache are compiler threads -- calls to the `close()` method are from `MetaspaceShared::preload_and_dump_impl` (metaspaceShared.cpp), `before_exit` (java.cpp) and `Threads::create_vm` (threads.cpp). (Well, modulo a rogue jcmd, perhaps?). ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25841#pullrequestreview-2934593364 PR Review Comment: https://git.openjdk.org/jdk/pull/25841#discussion_r2151615622 From roland at openjdk.org Tue Jun 17 08:15:15 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 17 Jun 2025 08:15:15 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v5] In-Reply-To: References: Message-ID: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/ae62f7d3..e7289215 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=03-04 Stats: 8 lines in 1 file changed: 2 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From bkilambi at openjdk.org Tue Jun 17 08:18:38 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 17 Jun 2025 08:18:38 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 03:00:59 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 7176: >> >>> 7174: vReg index, vReg tmp1) %{ >>> 7175: predicate((Matcher::vector_element_basic_type(n) == T_SHORT || >>> 7176: type2aelembytes(Matcher::vector_element_basic_type(n)) == 4) && >> >> To use the same basic type check condition, can we use `type2aelembytes(Matcher::vector_element_basic_type(n)) == 2` instead of `Matcher::vector_element_basic_type(n) == T_SHORT` here? > > How about just using the negate condition of rule `vselect_from_two_vectors` ? > > predicate(Matcher::vector_element_basic_type(n) != T_BYTE && > (UseSVE < 2 || Matcher::vector_length_in_bytes(n) < 16)); Hi @XiaohongGong , thanks for your prompt review. I can't use the negate condition here as it will include `T_DOUBLE` and `T_LONG` as well which are not supported. I'll change it to the same basic type check instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151627348 From xgong at openjdk.org Tue Jun 17 08:28:32 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 17 Jun 2025 08:28:32 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 08:23:32 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 4990: >> >>> 4988: %{ >>> 4989: constraint(ALLOC_IN_RC(v17_veca_reg)); >>> 4990: match(vReg); >> >> Not sure whether it's better to use `match(VecA)` here. > > `VecA` is also matched in `vReg`. I thought we want to also match `VecD` (for 64bit) and `VecX` (for 128-bit) as well and `vReg` matches all the three of them. What do you think? I'm fine with both. I pointed out this just because I noticed the reg is limited to `veca` in line-4989. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151651950 From xgong at openjdk.org Tue Jun 17 08:28:32 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 17 Jun 2025 08:28:32 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: <856mxqvMplFj15Pi59PuVNq3pEYA-GDygT8MHUZoiz4=.30952418-dacf-4820-b88f-7256db109de9@github.com> On Tue, 17 Jun 2025 08:16:01 GMT, Bhavana Kilambi wrote: >> How about just using the negate condition of rule `vselect_from_two_vectors` ? >> >> predicate(Matcher::vector_element_basic_type(n) != T_BYTE && >> (UseSVE < 2 || Matcher::vector_length_in_bytes(n) < 16)); > > Hi @XiaohongGong , thanks for your prompt review. I can't use the negate condition here as it will include `T_DOUBLE` and `T_LONG` as well which are not supported. I'll change it to the same basic type check instead. Yeah, I noticed this as well. Since these types for NEON has been excluded in above function `match_rule_supported_vector()`. Can we just ignore such types? Besides, you have also added the type assertion in the macro assembler implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151640353 From bkilambi at openjdk.org Tue Jun 17 08:28:31 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 17 Jun 2025 08:28:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 02:41:26 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and added a JTREG test > > src/hotspot/cpu/aarch64/aarch64.ad line 4990: > >> 4988: %{ >> 4989: constraint(ALLOC_IN_RC(v17_veca_reg)); >> 4990: match(vReg); > > Not sure whether it's better to use `match(VecA)` here. `VecA` is also matched in `vReg`. I thought we want to also match `VecD` (for 64bit) and `VecX` (for 128-bit) as well and `vReg` matches all the three of them. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151645302 From mhaessig at openjdk.org Tue Jun 17 08:40:38 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 08:40:38 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 14:50:46 GMT, Beno?t Maillard wrote: > This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) > - [x] tier1-3, plus some internal testing > - [x] Manual testing with values known to previously cause undefined behavior > > Thanks! Thank you for working on this, @benoitmaillard. This looks good to me. src/hotspot/share/runtime/globals.hpp line 1100: > 1098: /* Note: This value is zero mod 1<<13 for a cheap sparc set. */ \ > 1099: "Inline allocations larger than this in doublewords must go slow")\ > 1100: range(0, (1 << (BitsPerInt - LogBytesPerLong - 1)) - 1) \ It would be good to add a comment as to why this specific upper limit is necessary. ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25834#pullrequestreview-2934700344 PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2151674726 From aph at openjdk.org Tue Jun 17 08:51:33 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 17 Jun 2025 08:51:33 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2906: > 2904: tbl(dst, size1, src1, 2, dst); > 2905: } else { // vector length == 8 > 2906: // We need to fit both the source vectors (src1, src2) in a 128-bit register as the Suggestion: // We need to fit both the source vectors (src1, src2) in a 128-bit register because the Reason: "as" is ambiguous when used in this way. It can either mean "while" or "because". "Because" is stronger. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151700739 From aph at openjdk.org Tue Jun 17 08:57:31 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 17 Jun 2025 08:57:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5155: > 5153: // values across function calls and usually used for long-lived values), we can use any two volatile > 5154: // registers between V16-V31. > 5155: instruct vselect_from_two_vectors_HS_Neon(vReg dst, vReg_V17 src1, vReg_V18 src2, I think it's worth replicating this pattern a couple of times with more vector registers. The allocator might prefer some in the set v8-v15. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151713846 From aph at openjdk.org Tue Jun 17 09:25:32 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 17 Jun 2025 09:25:32 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 10:12:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments and added a JTREG test src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 4306: > 4304: f(0b001010, 15, 10), rf(Zn1, 5), rf(Zd, 0); > 4305: } > 4306: Please merge this with the earlier `sve_tbl`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2151779333 From epeter at openjdk.org Tue Jun 17 09:29:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Jun 2025 09:29:32 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 22:43:35 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > renaming src/hotspot/share/opto/loopTransform.cpp line 78: > 76: } > 77: return unique_loop_exit; > 78: } `proj_out_or_null` returns a `ProjNode` (it is probably a `IfTrue` or `IfFalse`, right?) and `outer_loop_exit` returns a `IfFalseNode`. So we should be able to return a `IfProjNode` from this method. What do you think? What is the benefit of the `unique_loop_exit` variable here? Why not return immediately? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2149677594 From mhaessig at openjdk.org Tue Jun 17 09:37:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 09:37:56 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v5] In-Reply-To: References: Message-ID: <5JnyMjRPWGHMw6WQq6GInd0OVvnKLyy4BxQ6Q-XZwz4=.12ca2a0d-0327-4fdc-8b6c-7aa682f36072@github.com> > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: - Merge branch 'master' into JDK-8020282-lea - Merge branch 'JDK-8020282-lea' of github.com:mhaessig/jdk into JDK-8020282-lea - Factor out address nodes for simplification - Add assert to codepath only reachable with stressing. - Rename for clarity Confused myself.... - Revert change to unrelated lines This reverts commit d1c6a653770bfe578b1982ac726b258fa08d57b8. - Apply typo suggestions Co-authored-by: Roberto Casta?eda Lozano - Add comment to benchmark as to why we fix the heap size - Add missing null chec - Fix typos - ... and 19 more: https://git.openjdk.org/jdk/compare/faaac6a9...0976824c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/3d6f8972..0976824c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=03-04 Stats: 92117 lines in 1600 files changed: 57473 ins; 22535 del; 12109 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Tue Jun 17 09:37:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 09:37:56 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v5] In-Reply-To: References: <-PFiiMlUghbFgg2fuU86vuEXKaexylDuk3kBdcBn9N8=.2c272bf1-10a7-4110-8919-f33ee0d491ba@github.com> Message-ID: <7IBRbt2cA-bO8w9mt1vsgT1pl5ErKe3UMWdtmadmCcM=.e017c863-d279-49b6-9395-2e3e3fade235@github.com> On Mon, 16 Jun 2025 14:27:37 GMT, Manuel H?ssig wrote: >> Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? > > This scenario looks like this: > ![image](https://github.com/user-attachments/assets/0c012810-c158-46dc-997d-6b5040c16377) > > Test coming soon. I spoke too soon. I am not able to reduce a test that reliable safepoints right after a lea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2151802824 From mhaessig at openjdk.org Tue Jun 17 09:42:59 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 09:42:59 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v6] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Merge branch 'master' into JDK-8020282-lea ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/0976824c..92e8a56c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=04-05 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Tue Jun 17 11:15:49 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 11:15:49 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v7] In-Reply-To: References: Message-ID: <2iyvUQeSbTQ0KsYs4qJKMDdlQ_IV9_v1_T-2dyaqcbs=.702b2d74-9767-467e-b8f9-ce04cfc91c08@github.com> > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix test flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/92e8a56c..6f3cfcb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Tue Jun 17 11:15:49 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 17 Jun 2025 11:15:49 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> References: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> Message-ID: On Fri, 13 Jun 2025 09:38:17 GMT, Roberto Casta?eda Lozano wrote: >>> Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? >> >> It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. > >> > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? >> >> It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. > > OK, that would be great! If you do not find one, I think the PR is still OK because it is easy to see how the peephole would handle the scenario. But it would be of course better to confirm it with an actual test case. @robcasloz, I integrated all your suggestions and simplified the IR-tests. But I was unfortunately not able to create a test that reliably safepoints. I kicked off testing again: - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15705629166) - [ ] tier1 - tier5 on x64 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2979945850 From kwei at openjdk.org Tue Jun 17 11:24:32 2025 From: kwei at openjdk.org (Kuai Wei) Date: Tue, 17 Jun 2025 11:24:32 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Mon, 16 Jun 2025 07:07:37 GMT, Emanuel Peter wrote: >> @eme64 I added some detail to pseudo code. I hope it can explain my design. >> A difficult part of the optimization is that we need find the right combine operator which can be replaced with merged load. Especially it has successor combine operator. We need to find which one is better candidate. >> For a given combine operator, the `MergePrimitiveLoad::run()` will do like this: >> >> ```c++ >> void MergePrimitiveLoad::run(AddNode* _check) { >> // go down through the unique output of _check to collect successor combine operators >> GrowableArray rest_combine_operators = collect_successor_combine_operators(_check) >> >> if ( !rest_combine_operator.is_empty() ) { >> for (AddNode* op in rest_combine_operators) { >> // collect mergeable load info list from this combine operator >> merge_list = collect_merge_info_list(op) >> if (_check is in merge_list) { >> // we will not optimize _check, it will be merged when op is optimized >> return; >> } >> } >> } >> >> // all successor combine operators are checked, we can start to optimize the given _check operator >> ... >> } >> >> // For a given combine operator, collect mergeable load info list. Every item >> // in the result list is a tuple of (Load, combine, shift_value) >> // It's similar with MergePrimitive::collect_merge_list() in previous patch, more detail can be found in code. >> >> GrowableArray collect_merge_info_list(AddNode* combine) { >> // get the load from input of combine >> LoadNode* load = get_load_from_input(combine); >> if (load == nullptr) return; >> >> MemNode* mem = memory input of load; >> >> for every output of mem { >> if (output->isa_load()) { >> // make a merged info for this load node >> info = merge_load_info(output); >> } >> if ( !info->is_invalid() ) { >> append info into result list >> } >> } >> >> // check the merged bytes and if they are adjacent. >> ... >> >> return result list; >> } > > @kuaiwei Thanks for the additional detail, that was helpful! > > To me, it seems that this test here is almost too rigorous, and hence it might be more expensive than necessary. And that is also what you were worried about, right? You were worried that some checks were done over and over again, and that is why you wanted to cache something. Correct? > > for (AddNode* op in rest_combine_operators) { > // collect mergeable load info list from this combine operator > merge_list = collect_merge_info_list(op) > if (_check is in merge_list) { > // we will not optimize _check, it will be merged when op is optimized > return; > } > } > > > Can you say why you need to do all the traversals you do in `collect_merge_info_list`? Why do you need to check all all other uses of the mem-input of the load there? > > I was wondering if it is not sufficient to just check if the structure on the other side of `op` has the correct shift and load address offset, so that it could be merged in with the merge list. Do you know what I mean? Hi @eme64 , as you guess, I think `collect_merge_info_list()` will be invoke multiple times for the same node. If a cache can be used, it can save much cost. My understanding about your idea is checking the successor `OrNode` to find the load address and shift offset, to see if it can be adjacent to current combine operator. I have difficult in such a case. For example, there are 8 `LoadByte` and they can be merged as a `LoadLong`. So there are 8 groups of merge info (load, combine, shift) . If current combine operator is in group 4, and the successor combine operator is in group 6. They are not adjacent. The pattern may be rare but it is still a valid graph. From the viewpoint of group 4 , I didn't know if they can be merged or not. I need continue to check output of the successor to see if we can find the missing part. And it may locate in the input chain of current combine operator. So I need check both direction ( input and output) of current combine operator. So my design is from the memory node and collect all mergeable group, and get the max merged group. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2979972756 From epeter at openjdk.org Tue Jun 17 11:36:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Jun 2025 11:36:32 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Tue, 17 Jun 2025 11:21:48 GMT, Kuai Wei wrote: >> @kuaiwei Thanks for the additional detail, that was helpful! >> >> To me, it seems that this test here is almost too rigorous, and hence it might be more expensive than necessary. And that is also what you were worried about, right? You were worried that some checks were done over and over again, and that is why you wanted to cache something. Correct? >> >> for (AddNode* op in rest_combine_operators) { >> // collect mergeable load info list from this combine operator >> merge_list = collect_merge_info_list(op) >> if (_check is in merge_list) { >> // we will not optimize _check, it will be merged when op is optimized >> return; >> } >> } >> >> >> Can you say why you need to do all the traversals you do in `collect_merge_info_list`? Why do you need to check all all other uses of the mem-input of the load there? >> >> I was wondering if it is not sufficient to just check if the structure on the other side of `op` has the correct shift and load address offset, so that it could be merged in with the merge list. Do you know what I mean? > > Hi @eme64 , as you guess, I think `collect_merge_info_list()` will be invoke multiple times for the same node. If a cache can be used, it can save much cost. > > My understanding about your idea is checking the successor `OrNode` to find the load address and shift offset, to see if it can be adjacent to current combine operator. > > I have difficult in such a case. For example, there are 8 `LoadByte` and they can be merged as a `LoadLong`. So there are 8 groups of merge info (load, combine, shift) . If current combine operator is in group 4, and the successor combine operator is in group 6. They are not adjacent. The pattern may be rare but it is still a valid graph. From the viewpoint of group 4 , I didn't know if they can be merged or not. I need continue to check output of the successor to see if we can find the missing part. And it may locate in the input chain of current combine operator. So I need check both direction ( input and output) of current combine operator. > > So my design is from the memory node and collect all mergeable group, and get the max merged group. @kuaiwei I see. If there are multiple groups, then things look more difficult. @merykitty Once proposed the idea of not doing MergeStores / MergeLoads as IGVN optimizations, but rather to just have a separate and dedicated phase. At the time, I was against it, because I had already implemented `MergeStores` quite far. But now I'm starting to see it as a possibly better alternative. That would allow you to take a global view, collect all loads (and stores), put them in a big list, and then make groups that belong together. And then see which groups could be legally replaced with a single load / store. In a way, that is a global vectorizer. And we could handle other patterns than just merging loads and stores: we could also merge copy patterns, for example. That could be much more powerful than the current approach. And it would avoid the issue with having to determine if the current node in IGVN is the best "candidate", or if we should look for another node further down. I don't know what you think about this complete "rethink" of the approach. But I do think it would be more powerful, and also avoid having to cache results during IGVN. All the "cached" results are local to that dedicated "MergeMemopsPhase" or whatever we would call it. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2980007259 From wenanjian at openjdk.org Tue Jun 17 12:06:37 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 17 Jun 2025 12:06:37 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch Message-ID: Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check ------------- Commit messages: - RISC-V: Simplify Interpreter::profile_taken_branch Changes: https://git.openjdk.org/jdk/pull/25848/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25848&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359801 Stats: 16 lines in 3 files changed: 0 ins; 10 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25848/head:pull/25848 PR: https://git.openjdk.org/jdk/pull/25848 From roland at openjdk.org Tue Jun 17 12:53:48 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 17 Jun 2025 12:53:48 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v6] In-Reply-To: References: Message-ID: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - review - more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/e7289215..fa550f23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=04-05 Stats: 60 lines in 2 files changed: 37 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From roland at openjdk.org Tue Jun 17 12:53:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 17 Jun 2025 12:53:49 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v4] In-Reply-To: <46PnJrx9oCCkTFd9tICMrsEXm_wb3dMaGQKT-xSv8Ec=.1046ecc3-9e7e-4fc2-a0bd-da2363dfed22@github.com> References: <4pxoG8BH9QOLHb9O7XqofwEcrOXqSSLIbbunGEE5UYg=.17621f96-6eaa-476e-8043-cba003880f2b@github.com> <46PnJrx9oCCkTFd9tICMrsEXm_wb3dMaGQKT-xSv8Ec=.1046ecc3-9e7e-4fc2-a0bd-da2363dfed22@github.com> Message-ID: On Tue, 17 Jun 2025 07:14:51 GMT, Emanuel Peter wrote: > I see you have some minimal comments at `PhaseIdealLoop::create_outer_strip_mined_loop`: > > ``` > // to the loop head. The inner strip mined loop is left as it is. Only > // once loop optimizations are over, do we adjust the inner loop exit > // condition to limit its number of iterations, set the outer loop > // exit condition and add Phis to the outer loop head. > ``` > > I think we should add some more info there, and link to and from `OuterStripMinedLoopNode::handle_sunk_stores_at_expansion`. Done in new commit. Can you have another look @eme64 ? > src/hotspot/share/opto/loopnode.cpp line 2992: > >> 2990: } >> 2991: >> 2992: // Sunk stores should be referenced from an outer loop memory Phi > > I think you really need to give some longer explanation here why we need to do what you do here. > > Also: is there anywhere a description why we do not have the phi already by default for outer loops? Because I think we should really describe that somewhere, and state our assumptions. And then you could also refer to that description from here, and from there to here. Updated in new commit. > src/hotspot/share/opto/loopnode.cpp line 2993: > >> 2991: >> 2992: // Sunk stores should be referenced from an outer loop memory Phi >> 2993: void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) { > > What does the word "expansion" refer to? Could you also mention that in your code comment above, please? I renamed that one. > src/hotspot/share/opto/loopnode.cpp line 2996: > >> 2994: Node* cle_exit_proj = inner_loop_exit(); >> 2995: >> 2996: // Sunk stores are pinned on the loop exit projection of the inner loop > > Can you add a description why? New commit should address that one as well. > src/hotspot/share/opto/loopnode.cpp line 3007: > >> 3005: #endif >> 3006: >> 3007: // Sunk stores are reachable from the memory state of the outer loop safepoint > > Is it true that the control of the safepoint is the `cle_exit_proj`? Could we add an assert for that? So we are just looking for all memory between those two control nodes? Or is it more complicated? Yes, it is. I added a call to `verify_strip_mined()` that checks the shape of the outer loop including the control flow nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2980261940 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2152184342 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2152190214 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2152185916 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2152189746 From dfenacci at openjdk.org Tue Jun 17 13:12:40 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 17 Jun 2025 13:12:40 GMT Subject: [jdk25] RFR: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 09:19:47 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> This pull request contains a backport of commit [534a8605](https://github.com/openjdk/jdk/commit/534a8605e5f4d771be69426687b2188d5353c91e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Damon Fenacci on 16 Jun 2025 and was reviewed by Tobias Hartmann and Emanuel Peter. >> >> Thanks! > > Marked as reviewed by shade (Reviewer). Thanks for the review @shipilev. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25821#issuecomment-2980323016 From dfenacci at openjdk.org Tue Jun 17 13:12:41 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 17 Jun 2025 13:12:41 GMT Subject: [jdk25] Integrated: 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 07:30:37 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [534a8605](https://github.com/openjdk/jdk/commit/534a8605e5f4d771be69426687b2188d5353c91e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 16 Jun 2025 and was reviewed by Tobias Hartmann and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: cc4e9716 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/cc4e9716acd9339c66176e4181e6444f65873016 Stats: 11 lines in 2 files changed: 2 ins; 8 del; 1 mod 8358129: compiler/startup/StartupOutput.java runs into out of memory on Windows after JDK-8347406 Reviewed-by: shade Backport-of: 534a8605e5f4d771be69426687b2188d5353c91e ------------- PR: https://git.openjdk.org/jdk/pull/25821 From snatarajan at openjdk.org Tue Jun 17 14:01:55 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 17 Jun 2025 14:01:55 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v3] In-Reply-To: References: Message-ID: > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - addressing review comments - Merge master - addressing review on code style and adding failing - Initial Fix ------------- Changes: https://git.openjdk.org/jdk/pull/25682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=02 Stats: 78 lines in 11 files changed: 54 ins; 8 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From snatarajan at openjdk.org Tue Jun 17 14:06:11 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 17 Jun 2025 14:06:11 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v4] In-Reply-To: References: Message-ID: > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: update to IGV README based on review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25682/files - new: https://git.openjdk.org/jdk/pull/25682/files/cd79d4a7..fd734aa9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From snatarajan at openjdk.org Tue Jun 17 14:13:39 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 17 Jun 2025 14:13:39 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Fri, 13 Jun 2025 15:36:26 GMT, Daniel Lund?n wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review on code style and adding failing > > src/hotspot/share/opto/macro.cpp line 2482: > >> 2480: return; >> 2481: } >> 2482: refine_strip_mined_loop_macro_nodes(); > > This call is later compared to before, right? In the previous version of `expand_macro_nodes`, it ran before the call to `eliminate_macro_nodes`. Is it safe to move it in this way? This placement of ` refine_strip_mined_loop_macro_nodes()` is ok as it only affects the functionality in the loop of the `eliminate_opaque_looplimit_macro_nodes` method. > src/hotspot/share/opto/macro.cpp line 2554: > >> 2552: bool PhaseMacroExpand::expand_macro_nodes() { >> 2553: // Do not allow new macro nodes once we started to expand >> 2554: C->reset_allow_macro_nodes(); > > Same here, this call is later compared to before (it is at the top of the old `expand_macro_nodes`, before `eliminate_macro_nodes`). Is this a safe move? I believe, this is also fine. However, I have moved it to `compile.cpp` to mimic the scenario as done previously before this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2152381926 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2152383946 From asmehra at openjdk.org Tue Jun 17 14:18:31 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Jun 2025 14:18:31 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 02:34:24 GMT, Vladimir Kozlov wrote: > It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. > > Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. > > I also noticed that we missed similar check after Compile_lock when we are storing AOT code. > > Tested hs-tier1-6,hs-tier10-rt,stress,xcomp Unless I missed something, I think there is still synchronization problem between the thread adding a string to the AOTCodeCache and the thread closing the AOTCodeCache. See the following pattern of execution: t0: Thread T1 calls add_C_string(), checks for `is_on_for_dump` which returns true t1: Thread T2 calls close() and completes the call to finish_write() t2: T1 acquires AOTCodeCStrings_lock and checks that _table is not null which returns true t3: T2 clears the table and sets it to null t4: T1 tries to call `table->add_C_string` I think at the time of shutting down the AOTCodeCache, the thread should hold both `Compile_lock` and `AOTCodeCStrings_lock`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2980562587 From dfenacci at openjdk.org Tue Jun 17 15:12:30 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 17 Jun 2025 15:12:30 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: References: Message-ID: <5EregFPpep4Y8cL1v_GnvR6vq415jVK-u_6MuCPNfm4=.8b2ac0ec-47fe-4f50-9815-a61ea400f58f@github.com> On Thu, 12 Jun 2025 06:29:26 GMT, Amit Kumar wrote: > Just for my understanding. Even if test passes we still want to see this warning: > > ``` > [warning][codecache] CodeCache is full. Compiler has been disabled. > ``` The test passes with and without that message. When the randomly chosen amount of code cache is not enough to start the compiler(s) it should print that message, when it is enough to start both compilers, you don't see that message. The important thing is that there is no crash when compilers are trying to reserve code cache (they should be just shut down). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25741#issuecomment-2980763219 From eastigeevich at openjdk.org Tue Jun 17 15:12:41 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 17 Jun 2025 15:12:41 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Tue, 3 Jun 2025 15:49:30 GMT, Tom Rodriguez wrote: > Since this PR doesn't actually perform any relocation, I'm not sure what the plan is here. The plan is to use this functionality in [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205) > The most aggressive thing that could be done is to invalidate all frames which have the old nmethod on stack, but that still leaves the nmethod live for the purposes of deopt. It would probably be ok to synthesize an unload after the deopt since there should be no actual execution in those nmethods, but you will then have to suppress the one that's normally done during nmethod::unlink. This might have negative performance impact because we will be relocating hot nmethods. IMO it's better to let calls of the original nmethod to finish. New calls will be using the copy. It looks like the implementation does not move code in the terms of the JVMTI spec. The JVMTI spec expects moving code to unload it from memory: > Compiled Method Unload > > Sent when a compiled method is unloaded from memory. As we don't want to unload code from memory, we cannot send Compiled Method Unload event. I think we can generate just Compiled Method Load event because of the note: > Note that a single method may have multiple compiled forms, and that this event will be sent for each form. Alternatively, we can update the JVMTI spec to say Compiled Method Load event can be a result of code copied. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2980761814 From kvn at openjdk.org Tue Jun 17 15:13:31 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 15:13:31 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 14:16:21 GMT, Ashutosh Mehra wrote: > t3: T2 clears the table and sets it to null It can't because T1 holds the lock. See that I added the lock before `delete _table`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2980765800 From kvn at openjdk.org Tue Jun 17 15:13:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 15:13:32 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 08:11:12 GMT, Andrew Dinn wrote: >> It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. >> >> Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. >> >> I also noticed that we missed similar check after Compile_lock when we are storing AOT code. >> >> Tested hs-tier1-6,hs-tier10-rt,stress,xcomp > > Looks good Thank you @adinn and @ashu-mehra for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2980766938 From kvn at openjdk.org Tue Jun 17 15:18:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 15:18:27 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 08:10:43 GMT, Andrew Dinn wrote: >> It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. >> >> Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. >> >> I also noticed that we missed similar check after Compile_lock when we are storing AOT code. >> >> Tested hs-tier1-6,hs-tier10-rt,stress,xcomp > > src/hotspot/share/code/aotCodeCache.cpp line 823: > >> 821: // and the main thread generating adapter >> 822: MutexLocker ml(Compile_lock); >> 823: if (!is_on()) { > > Just for the record: > > I can see why this is needed to stop compiler threads dereferencing a null cache pointer or null table pointer when some other thread might concurrently be closing the cache. > > That led me to wonder why we don't need to further synchronize concurrent execution of non-compiler threads. I convinced myself that whenever a non-compiler thread calls `AOTCodeCache::close()` the only other running threads that might try to access the AOT cache are compiler threads -- calls to the `close()` method are from `MetaspaceShared::preload_and_dump_impl` (metaspaceShared.cpp), `before_exit` (java.cpp) and `Threads::create_vm` (threads.cpp). (Well, modulo a rogue jcmd, perhaps?). Yes. Ans with `jcmd`, as @iklam pointed in an other PR, "that would create far worse problems than the bug that we are trying to fix here" (https://git.openjdk.org/jdk/pull/25816#issuecomment-2975238342) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25841#discussion_r2152539611 From iklam at openjdk.org Tue Jun 17 15:33:28 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Jun 2025 15:33:28 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 02:34:24 GMT, Vladimir Kozlov wrote: > It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. > > Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. > > I also noticed that we missed similar check after Compile_lock when we are storing AOT code. > > Tested hs-tier1-6,hs-tier10-rt,stress,xcomp LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25841#pullrequestreview-2936158875 From amitkumar at openjdk.org Tue Jun 17 15:37:27 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Jun 2025 15:37:27 GMT Subject: RFR: 8358756: [s390x] Test StartupOutput.java crash due to CodeCache size [v2] In-Reply-To: <5EregFPpep4Y8cL1v_GnvR6vq415jVK-u_6MuCPNfm4=.8b2ac0ec-47fe-4f50-9815-a61ea400f58f@github.com> References: <5EregFPpep4Y8cL1v_GnvR6vq415jVK-u_6MuCPNfm4=.8b2ac0ec-47fe-4f50-9815-a61ea400f58f@github.com> Message-ID: On Tue, 17 Jun 2025 15:09:42 GMT, Damon Fenacci wrote: > > Just for my understanding. Even if test passes we still want to see this warning: > > ``` > > [warning][codecache] CodeCache is full. Compiler has been disabled. > > ``` > > The test passes with and without that message. When the randomly chosen amount of code cache is not enough to start the compiler(s) it should print that message, when it is enough to start both compilers, you don't see that message. The important thing is that there is no crash when compilers are trying to reserve code cache (they should be just shut down). What I wanted to verify with above expected crash is that current number are not enough for the compilers and we saw the output is containing that "Codecache is full" message. Now we can claim that this message is there in the log output and test passes successfully. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25741#issuecomment-2980855102 From kvn at openjdk.org Tue Jun 17 15:40:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 15:40:27 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 15:30:28 GMT, Ioi Lam wrote: >> It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. >> >> Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. >> >> I also noticed that we missed similar check after Compile_lock when we are storing AOT code. >> >> Tested hs-tier1-6,hs-tier10-rt,stress,xcomp > > LGTM Thank you, @iklam ------------- PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2980863197 From kvn at openjdk.org Tue Jun 17 15:58:33 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 15:58:33 GMT Subject: Integrated: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: <5L29WJisOEHylesU5hja2eThPzrYmeH2tNB1cIf8Y2k=.cd4563b1-f219-4323-b79f-9dd38d12300e@github.com> On Tue, 17 Jun 2025 02:34:24 GMT, Vladimir Kozlov wrote: > It is concurrency issue. Call to `AOTCodeAddressTable::add_C_string()` happened after checks that AOT code cache is still opened. But, because there is no synchronization, other thread (VM) closed/delete AOT code cache (after dumping) before code in `add_C_string()` accessed it. > > Added missed AOTCodeCStrings_lock in places where we modify, store and delete AOT strings table. Moved MutexLocker from `AOTCodeAddressTable::add_C_string()` to its caller and do additional check after it. > > I also noticed that we missed similar check after Compile_lock when we are storing AOT code. > > Tested hs-tier1-6,hs-tier10-rt,stress,xcomp This pull request has now been integrated. Changeset: 96070212 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/96070212adfd15acd99edf6e180db6228ee7b4ff Stats: 15 lines in 1 file changed: 9 ins; 4 del; 2 mod 8359646: C1 crash in AOTCodeAddressTable::add_C_string Reviewed-by: adinn, iklam ------------- PR: https://git.openjdk.org/jdk/pull/25841 From kvn at openjdk.org Tue Jun 17 16:11:31 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Jun 2025 16:11:31 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v4] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:53:42 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion Re-approve ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25409#pullrequestreview-2936283747 From eastigeevich at openjdk.org Tue Jun 17 16:42:32 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 17 Jun 2025 16:42:32 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 14:50:46 GMT, Beno?t Maillard wrote: > This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) > - [x] tier1-3, plus some internal testing > - [x] Manual testing with values known to previously cause undefined behavior > > Thanks! src/hotspot/share/opto/graphKit.cpp line 3807: > 3805: int log2_esize = Klass::layout_helper_log2_element_size(layout_con); > 3806: fast_size_limit <<= (LogBytesPerLong - log2_esize); > 3807: assert (fast_size_limit > 0, "increasing the size limit should not produce negative values"); Prior C++14 left shit producing a negative value is undefined behavior: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2161.pdf Do we compile c++ source specifying the C++ standard? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2152715500 From shade at openjdk.org Tue Jun 17 16:42:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 17 Jun 2025 16:42:33 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v4] In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:53:42 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion Thanks! I need a second Reviewer, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25409#issuecomment-2981065189 From dlong at openjdk.org Tue Jun 17 17:07:35 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 17 Jun 2025 17:07:35 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 16:39:56 GMT, Evgeny Astigeevich wrote: >> This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) >> - [x] tier1-3, plus some internal testing >> - [x] Manual testing with values known to previously cause undefined behavior >> >> Thanks! > > src/hotspot/share/opto/graphKit.cpp line 3807: > >> 3805: int log2_esize = Klass::layout_helper_log2_element_size(layout_con); >> 3806: fast_size_limit <<= (LogBytesPerLong - log2_esize); >> 3807: assert (fast_size_limit > 0, "increasing the size limit should not produce negative values"); > > Prior C++14 left shit producing a negative value is undefined behavior: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2161.pdf > > Do we compile c++ source specifying the C++ standard? Yes we use -std=c++14, but creating a negative value in this way still feels like a kind of overflow to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2152762023 From asmehra at openjdk.org Tue Jun 17 17:39:33 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Jun 2025 17:39:33 GMT Subject: RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: <5V_lRRQZEltF8kJWp7PV8InPXcGOSzxgE_-bHLGKeb8=.27a4eeb2-7a09-4151-89b3-5c467df1d508@github.com> Message-ID: On Tue, 17 Jun 2025 15:10:29 GMT, Vladimir Kozlov wrote: > It can't because T1 holds the lock. See that I added the lock before delete _table. yup, I missed that lock. All good then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25841#issuecomment-2981228780 From rcastanedalo at openjdk.org Tue Jun 17 17:51:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Jun 2025 17:51:34 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v7] In-Reply-To: <2iyvUQeSbTQ0KsYs4qJKMDdlQ_IV9_v1_T-2dyaqcbs=.702b2d74-9767-467e-b8f9-ce04cfc91c08@github.com> References: <2iyvUQeSbTQ0KsYs4qJKMDdlQ_IV9_v1_T-2dyaqcbs=.702b2d74-9767-467e-b8f9-ce04cfc91c08@github.com> Message-ID: On Tue, 17 Jun 2025 11:15:49 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix test flags Looks good! src/hotspot/cpu/x86/peephole_x86_64.cpp line 351: > 349: Node* dependant_lea = decode->fast_out(i); > 350: if (dependant_lea->is_Mach() && dependant_lea->as_Mach()->ideal_Opcode() == Op_AddP) { > 351: Nit: you could remove this empty line. test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 146: > 144: } > 145: i += 1; > 146: } Now that you only have one dimension, I think it is simpler to replace this loop with: scenarios[0] = new Scenario(0, "-XX:+IgnoreUnrecognizedVMOptions", "-XX:-OptoPeephole"); scenarios[1] = new Scenario(1, "-XX:+IgnoreUnrecognizedVMOptions", "-XX:+OptoPeephole"); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 895: > 893: machOnly(LEA_P_32_NARROW, "leaP32Narrow"); > 894: } > 895: This rule is unused and could be removed. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2936529256 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2152811481 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2152823062 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2152818974 From rcastanedalo at openjdk.org Tue Jun 17 17:51:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Jun 2025 17:51:34 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> References: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> Message-ID: <_0u23WPM6B2JmmIaS2VQe7iV5UtXglH-BWPhjvjX02A=.140ac707-df93-410b-b0a3-ad572ca3248a@github.com> On Fri, 13 Jun 2025 09:38:17 GMT, Roberto Casta?eda Lozano wrote: >>> Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? >> >> It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. > >> > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? >> >> It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. > > OK, that would be great! If you do not find one, I think the PR is still OK because it is easy to see how the peephole would handle the scenario. But it would be of course better to confirm it with an actual test case. > @robcasloz, I integrated all your suggestions and simplified the IR-tests. Thanks! > But I was unfortunately not able to create a test that reliably safepoints. Fair enough, thanks for trying. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2981273397 From rcastanedalo at openjdk.org Tue Jun 17 17:52:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Jun 2025 17:52:32 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com> <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com> Message-ID: On Mon, 16 Jun 2025 15:19:59 GMT, Roland Westrelin wrote: > > > One way would be to simply assert that there's no `NarrowMemProj`s left during final graph reshape. Is that what you'd like? > > > > > > Yes, that would be great (and I think it is OK to leave it to a future RFE if fully enforcing it would further increase the complexity of this PR). > > Ok. I propose to file the follow up RFE if/once this PR has integrated as there's no follow up work until this code is in. Sure, that makes sense, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2981281578 From cslucas at openjdk.org Tue Jun 17 19:18:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 17 Jun 2025 19:18:33 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:21:34 GMT, Dhamoder Nalla wrote: >> This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations. >> >> >> **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.** >> >> **1. Initial State (Before Transformation)** >> The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node. >> >> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) >> >> **2. After Splitting Through Child Phi** >> The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input. >> >> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) >> >> **3. After Splitting Load Field Through Parent Phi** >> The field load operation (Load) is pushed even further up in the graph. >> >> Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging. >> >> This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement. >> >> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) >> >> ### JMH Benchmark Results: >> >> #### With Disabled RAM >> >> | Benchmark | Mode | Count | Score | Error | Units | >> |-----------|------|-------|-------|-------|-------| >> | testBailOut_runner | avgt | 15 | 13.969 | ? 0.248 | ms/op | >> | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ? 4.306 | ms/op | >> | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ? 1.781 | ms/op | >> | testMultiParentPhi_runner | avgt | 15 | 2.983 | ? 0.001 | ms/op | >> | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ? 0.731 | ms/op | >> | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ? 0.443 | ms/op | >> | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ? 1.212 | ms/op | >> | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ? 1.478 | ms/op | >> | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ? 0.306 | ms/op | >> | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ? 0.247 | ms/op | >> | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ? 4.869 ... > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > address CR comments Thank you for persisting on this @dhanalla . I just did a quick look. I'll look again and run tests as soon as I get some time. src/hotspot/share/opto/escape.cpp line 465: > 463: bool ConnectionGraph::can_reduce_phi_check_inputs(PhiNode* ophi) const { > 464: bool found_sr_allocate = false; > 465: int nof_input_phi_nodes = 0; Looks like this can be just a boolean variable. src/hotspot/share/opto/escape.cpp line 569: > 567: } else if (use->is_Phi()) { > 568: if (n->_idx == use->_idx) { > 569: NOT_PRODUCT(if (TraceReduceAllocationMerges) tty->print_cr("Cannot reduce Self loop nested Phi");) NIT: "self" src/hotspot/share/opto/escape.cpp line 571: > 569: NOT_PRODUCT(if (TraceReduceAllocationMerges) tty->print_cr("Cannot reduce Self loop nested Phi");) > 570: return false; > 571: } else if (!can_reduce_phi_check_inputs(use->as_Phi()) || !can_reduce_check_users(use->as_Phi(), phi_nest_level+1)) { Maybe worth printing another trace message here? Saying we are not reducing the parent Phi because we can't reduce the child phi? src/hotspot/share/opto/escape.cpp line 1310: > 1308: Node* use = ophi->fast_out(i); > 1309: if (use->is_Phi()) { > 1310: assert(use->_idx != ophi->_idx, "Unexpected selfloop Phi."); Should we bailout of the reduction process if we somehow end up in this situation? I.e., in a debug build we'll assert, but in a product build you're just ignoring the problem. ------------- PR Review: https://git.openjdk.org/jdk/pull/21270#pullrequestreview-2936791060 PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2152984586 PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2152989734 PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2152992496 PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2153000431 From cslucas at openjdk.org Tue Jun 17 19:23:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 17 Jun 2025 19:23:33 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:21:34 GMT, Dhamoder Nalla wrote: >> This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations. >> >> >> **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.** >> >> **1. Initial State (Before Transformation)** >> The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node. >> >> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) >> >> **2. After Splitting Through Child Phi** >> The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input. >> >> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) >> >> **3. After Splitting Load Field Through Parent Phi** >> The field load operation (Load) is pushed even further up in the graph. >> >> Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging. >> >> This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement. >> >> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) >> >> ### JMH Benchmark Results: >> >> #### With Disabled RAM >> >> | Benchmark | Mode | Count | Score | Error | Units | >> |-----------|------|-------|-------|-------|-------| >> | testBailOut_runner | avgt | 15 | 13.969 | ? 0.248 | ms/op | >> | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ? 4.306 | ms/op | >> | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ? 1.781 | ms/op | >> | testMultiParentPhi_runner | avgt | 15 | 2.983 | ? 0.001 | ms/op | >> | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ? 0.731 | ms/op | >> | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ? 0.443 | ms/op | >> | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ? 1.212 | ms/op | >> | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ? 1.478 | ms/op | >> | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ? 0.306 | ms/op | >> | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ? 0.247 | ms/op | >> | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ? 4.869 ... > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > address CR comments test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 563: > 561: @IR(counts = { IRNode.ALLOC, ">=3" }, > 562: phase = CompilePhase.PHASEIDEAL_BEFORE_EA, > 563: applyIfPlatform = {"64-bit", "true"}, Looks like all tests should be run on 64bit platform? If so perhaps you can just add the requirement at the top of this file, in the annotations section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2153009447 From duke at openjdk.org Tue Jun 17 20:15:25 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 17 Jun 2025 20:15:25 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v25] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: - Move far branch fix to fix_relocation_after_move - Add test to verify JVMTI events during nmethod relocation - Log relocated nmethod - Publish JVMTI events ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/4e80e359..40917404 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=23-24 Stats: 437 lines in 11 files changed: 425 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Tue Jun 17 20:21:00 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 17 Jun 2025 20:21:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v26] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Only check branch distance for aarch64 and riscv - Move far branch fix to fix_relocation_after_move ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/40917404..03bfce87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=24-25 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Tue Jun 17 20:29:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 17 Jun 2025 20:29:39 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> Message-ID: On Tue, 17 Jun 2025 04:08:18 GMT, Dean Long wrote: >> Isn't this logic only required because of Graal (JDK-8358096)? For HotSpot, there should always be a trampoline if one is needed [1], even taking into account possible nmethod relocation, because MacroAssembler::trampoline_call() uses is_always_within_branch_range(), which makes its decision based only on the codecache size, boundaries, and the target, not the address of the call site. >> Further, my understanding is that the Graal vs HotSpot issue is because Graal uses the A64 ISA "hard" limit reachability limit, while HotSpot uses a "soft" limit thanks to this constant with a lower value for debug: >> >> static const uint64_t branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M); >> >> [1] By "needed" here, I mean the hard A64 ISA limit. It might be useful to have a version of reachable_from_branch_at() that used the hard limit when working with JVMCI-generate code, because ca >> >>> So it is incorrectly updating the offset when it needs to stay the same. >> >> I don't understand what exactly is going with that, but after working on JDK-8321509, I wouldn't be surprised if there were assumptions that break when trying relocate, and you indeed might need to move the logic up a level, or even take into account what type of relocation it is, or the fact that the source is a finalized nmethod and not a BufferBlob. > > I took another look, and I think the problem with CallRelocation::fix_relocation_after_move() is the ambiguity of "destination" for a trampoline call. There is the near instruction destination inside the same nmethod, and there is the effective final far destination contained in the trampoline stub. For nmethod to nmethod relocation, I think you want a customized version of this function instead of trying to use it as-is. The customized version could continue to use the near destination, but fixed up with new_addr_for, or maybe simpler would be to use the final far destination, which is always going to be outside the nmethod if you clear inline caches first, which I think you would need to do, otherwise recursive self calls to the Java method would not get relocated correctly. > Isn't this logic only required because of Graal (JDK-8358096)? I think it is needed for HotSpot as well. Just because a trampoline was generated for a call does mean it is being used. We still need this check in the event that there is a direct call that no longer reaches. > For nmethod to nmethod relocation, I think you want a customized version of this function instead of trying to use it as-is. I moved this logic to [CallRelocation::fix_relocation_after_move()](https://github.com/chadrako/jdk/blob/03bfce8779a0f9c9a2c276fc6cf084698e6725d7/src/hotspot/share/code/relocInfo.cpp#L373-L402). It first checks if the destination is within the source and also check if the call is too far. > otherwise recursive self calls to the Java method would not get relocated correctly What do you mean by this? I don't see how recursive calls would behave differently. There is a check for intra-nmethod calls which I think would cover this case ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2153109852 From duke at openjdk.org Tue Jun 17 20:40:57 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 17 Jun 2025 20:40:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v27] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Only check branch distance for aarch64 and riscv - Move far branch fix to fix_relocation_after_move - Move far branch fix to fix_relocation_after_move - Add test to verify JVMTI events during nmethod relocation - Log relocated nmethod - Publish JVMTI events - Fix test copywrite - Update immutable_data_references naming - Update ImmutableDataReferences - ... and 73 more: https://git.openjdk.org/jdk/compare/afa52e46...6173fdbb ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=26 Stats: 1603 lines in 23 files changed: 1567 ins; 1 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From dlong at openjdk.org Tue Jun 17 21:07:38 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 17 Jun 2025 21:07:38 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> Message-ID: On Tue, 17 Jun 2025 20:26:29 GMT, Chad Rakoczy wrote: > What do you mean by this? I don't see how recursive calls would behave differently. There is a check for intra-nmethod calls which I think would cover this case OK, I didn't see that check when I took a quick look. If it already works as intended, great. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2153168807 From duke at openjdk.org Tue Jun 17 21:14:56 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 17 Jun 2025 21:14:56 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v28] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Use new _metadata_size instead of _jvmci_data_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/6173fdbb..d5e566c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From dlong at openjdk.org Tue Jun 17 21:22:36 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 17 Jun 2025 21:22:36 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> Message-ID: On Tue, 17 Jun 2025 21:05:12 GMT, Dean Long wrote: >>> Isn't this logic only required because of Graal (JDK-8358096)? >> >> I think it is needed for HotSpot as well. Just because a trampoline was generated for a call does mean it is being used. We still need this check in the event that there is a direct call that no longer reaches. >> >>> For nmethod to nmethod relocation, I think you want a customized version of this function instead of trying to use it as-is. >> >> I moved this logic to [CallRelocation::fix_relocation_after_move()](https://github.com/chadrako/jdk/blob/03bfce8779a0f9c9a2c276fc6cf084698e6725d7/src/hotspot/share/code/relocInfo.cpp#L373-L402). It first checks if the destination is within the source and also check if the call is too far. >> >>> otherwise recursive self calls to the Java method would not get relocated correctly >> >> What do you mean by this? I don't see how recursive calls would behave differently. There is a check for intra-nmethod calls which I think would cover this case > >> What do you mean by this? I don't see how recursive calls would behave differently. There is a check for intra-nmethod calls which I think would cover this case > > OK, I didn't see that check when I took a quick look. If it already works as intended, great. > We still need this check in the event that there is a direct call that no longer reaches. OK, I didn't realize that was what Relocation::pd_set_call_destination() was doing. I think it would be better for the CPU-specific code to take care of that, rather than the shared code. We already have functions like NativeCall::set_destination_mt_safe() that do the right thing regarding trampolines. I think this could be refactored into a commonm function that Relocation::pd_set_call_destination() could also use. Sorry for the churn, but hopefully we are converging on a solution. I thought I had done the refactoring for 8321509, but it looks like I went with the simply fix at the time of adding a parameter to set_destination_mt_safe() to make it lock-free. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2153188035 From fyang at openjdk.org Wed Jun 18 00:59:44 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Jun 2025 00:59:44 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v28] In-Reply-To: References: Message-ID: <9nobEwalgtW3at_rR9OV1Y-rZdHxL0DJ0-UWCt0wA10=.eb7bb18c-7d1f-458c-b377-f9d348a38155@github.com> On Tue, 17 Jun 2025 21:14:56 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Use new _metadata_size instead of _jvmci_data_size src/hotspot/share/code/relocInfo.cpp line 415: > 413: // We must check that the new offset can still fit in the instruction > 414: // for architectures that have small branch ranges > 415: #if defined(AARCH64) || defined(RISV) Should be `RISC-V` instead of `RISV`. src/hotspot/share/code/relocInfo.cpp line 419: > 417: if (NativeCall::is_call_at(addr())) { > 418: NativeCall* call = nativeCall_at(addr()); > 419: address trampoline = call->get_trampoline(); We don't have this `call->get_trampoline()` method for RISC-V now. Trampoline call for RISC-V was deprecated by https://bugs.openjdk.org/browse/JDK-8332689 and later removed by https://bugs.openjdk.org/browse/JDK-8343430. So I guess RISC-V is not affected here in this case? CC: @robehn ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2153410156 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2153416320 From duke at openjdk.org Wed Jun 18 01:48:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 18 Jun 2025 01:48:39 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v29] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Print address as pointer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/d5e566c1..50f0edb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=27-28 Stats: 7 lines in 3 files changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From syan at openjdk.org Wed Jun 18 02:06:32 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 18 Jun 2025 02:06:32 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: <3ilxiWL1ARe_osTID0jmnn05ftUZSUk2q4EkDvtDDro=.7b05d1ac-7321-482f-a163-81a490d67d00@github.com> On Tue, 17 Jun 2025 08:41:41 GMT, Anjian Wen wrote: > Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv > > The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 1077: > 1075: test_method_data_pointer(mdp, profile_continue); > 1076: > 1077: // We are not taking a branch. Increment the not taken count. Seems an extra whitespace before `Increment` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25848#discussion_r2153476650 From jkarthikeyan at openjdk.org Wed Jun 18 04:02:36 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 18 Jun 2025 04:02:36 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v4] In-Reply-To: References: Message-ID: <1PL_T4eEjPTpI9lHg3nHmNzgxsXJpesG4Rxva4iZi60=.d73a5da9-704b-4bc7-8677-47e654a5c7c4@github.com> > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge, address code review - Merge branch 'master' into vector-truncation - Add assert for unexpected node in truncation - Reformat, add comments and char tests - Fix vector truncation with subword types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/e2ab39c4..ce16e2de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=02-03 Stats: 207689 lines in 3722 files changed: 125627 ins; 56108 del; 25954 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Wed Jun 18 04:02:37 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 18 Jun 2025 04:02:37 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v3] In-Reply-To: References: <3hS55vekJ3n3KqQeHEW0P__Gvp2Z76az1MtgBxL2uHU=.6a51c235-b5fc-4e85-b73b-cf8db4539cd8@github.com> Message-ID: On Mon, 16 Jun 2025 06:06:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2538: >> >>> 2536: break; >>> 2537: default: >>> 2538: assert(false, "Unexpected node: %s", NodeClassNames[in->Opcode()]); >> >> Suggestion: >> >> // If this assert it hit, that means that we need to determine if the node can be safely truncated, >> // and then add it to the list of truncatable nodes or the list of non-truncatable ones just above. >> // In product, we just return false, which is always correct. >> assert(false, "Unexpected node: %s", NodeClassNames[in->Opcode()]); > > I'm fairly sure that we will hit this assert with a fuzzer or some other RFE soon, and then it will be nice to know quickly what kind of failure we have here. This is a good point, I've made this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2153586302 From jkarthikeyan at openjdk.org Wed Jun 18 04:06:28 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 18 Jun 2025 04:06:28 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Wed, 28 May 2025 07:46:12 GMT, Emanuel Peter wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > And just for good measure: should we also add tests for `char`? @eme64 I've updated the patch to address the comments, let me know what you think! @mur47x111 Thanks for the comment! I've merged from master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2982600423 From jkarthikeyan at openjdk.org Wed Jun 18 04:06:30 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 18 Jun 2025 04:06:30 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v3] In-Reply-To: References: <3hS55vekJ3n3KqQeHEW0P__Gvp2Z76az1MtgBxL2uHU=.6a51c235-b5fc-4e85-b73b-cf8db4539cd8@github.com> Message-ID: On Mon, 16 Jun 2025 06:01:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2521: >> >>> 2519: if (VectorNode::is_shift_opcode(opc)) { >>> 2520: return false; >>> 2521: } >> >> Is right shift also not truncatable? Can you add a comment why? > > Here you did a return in assert. Shifts are the special case I mentioned earlier, where we consider them non-truncating to get a more precise type from input loads. I've added a comment mentioning it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2153589837 From amitkumar at openjdk.org Wed Jun 18 04:36:29 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 18 Jun 2025 04:36:29 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Wed, 11 Jun 2025 15:02:12 GMT, Damon Fenacci wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix >> - move the changes in flag constraints specific file > > I might be a bit picky (sorry for that) but since the flag was triggering a crash I was wondering if we could have a small regression test to make sure the VM never crashes (possibly checking the error as well). @dafedafe do you have further requests :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2982641404 From enikitin at openjdk.org Wed Jun 18 06:04:16 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 18 Jun 2025 06:04:16 GMT Subject: RFR: 8357739: [jittester] disable the hashCode method Message-ID: JITTester often uses the `hasCode` method (in fact, in almost every generated test). Given that the method can be unstable between runs or in interpreted vs compiled runs, it can create false-positives. This PR fixes the issue by adding support for method templates similar to the ones used in CompilerCommands). All of those exclude templates match (and exclude) `String.indexOf(String)`, for example: java/lang/::*(Ljava/lang/String;I) *String::indexOf(*) java/lang/*::indexOf Additionally, the PR adds support for comments (starting from '#') and empty lines in the excludes file. ------------- Commit messages: - Add unit tests - 8357739: [jittester] disable the hashCode method Changes: https://git.openjdk.org/jdk/pull/25859/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25859&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357739 Stats: 556 lines in 4 files changed: 402 ins; 121 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/25859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25859/head:pull/25859 PR: https://git.openjdk.org/jdk/pull/25859 From epeter at openjdk.org Wed Jun 18 06:13:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 06:13:35 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: <15yeg5mhhgl_0k-ZvjRcrqUGtNaILSsgv2zmJ8L3MI4=.0fe56528-1cf6-4d3f-99a2-925399790ced@github.com> On Tue, 17 Jun 2025 19:19:20 GMT, Cesar Soares Lucas wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> address CR comments > > test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 563: > >> 561: @IR(counts = { IRNode.ALLOC, ">=3" }, >> 562: phase = CompilePhase.PHASEIDEAL_BEFORE_EA, >> 563: applyIfPlatform = {"64-bit", "true"}, > > Looks like all tests should be run on 64bit platform? If so perhaps you can just add the requirement at the top of this file, in the annotations section. @JohnTortugo I'd generally advise against restricting files to limited platforms. The tests can still run on other platforms, and possibly find bugs there. Limiting IR rules allows us to make strong assertions on limited platforms (e.g. asserting that some optimizations are done), while at least testing weaker properties (correctness) on all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2153726516 From dfenacci at openjdk.org Wed Jun 18 06:24:31 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 18 Jun 2025 06:24:31 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v7] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Mon, 16 Jun 2025 11:45:09 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > remove string check & update copyright header Looks good to me. Thanks again @offamitkumar! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2937877542 From wenanjian at openjdk.org Wed Jun 18 06:45:29 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 18 Jun 2025 06:45:29 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch In-Reply-To: <3ilxiWL1ARe_osTID0jmnn05ftUZSUk2q4EkDvtDDro=.7b05d1ac-7321-482f-a163-81a490d67d00@github.com> References: <3ilxiWL1ARe_osTID0jmnn05ftUZSUk2q4EkDvtDDro=.7b05d1ac-7321-482f-a163-81a490d67d00@github.com> Message-ID: On Wed, 18 Jun 2025 02:03:43 GMT, SendaoYan wrote: >> Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv >> >> The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 1077: > >> 1075: test_method_data_pointer(mdp, profile_continue); >> 1076: >> 1077: // We are not taking a branch. Increment the not taken count. > > Seems an extra whitespace before `Increment` Thanks for review, it seems that there are same whitespace at x86 and aarch64, and there are about 29 whitespaces with 'Increment' after in dir 'src/hotspot/cpu/' , so I'm not sure whether it is a kind of format. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25848#discussion_r2153771406 From thartmann at openjdk.org Wed Jun 18 06:58:31 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Jun 2025 06:58:31 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v2] In-Reply-To: References: Message-ID: <6F52A2UrVFrub3bVZQ4Zrlmw4wCuIdW9YwphmtYe4qg=.51b3c501-ba87-4e3e-a0de-210e0aa41ea4@github.com> On Mon, 16 Jun 2025 03:36:33 GMT, Fei Yang wrote: >> Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. >> >> There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. >> >> And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. >> >> Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. >> >> Testing: >> - [x] Tier1-3 on linux-x64 (release & fastdebug) >> - [x] Tier1-3 on linux-aarch64 (release & fastdebug) >> - [x] Tier1-3 on linux-riscv64 (release) >> - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - add test > - Merge remote-tracking branch 'upstream/master' into JDK-8359270 > - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call Thanks for adding a test! I'll run some testing on our side and report back once it passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/25765#pullrequestreview-2937956176 From epeter at openjdk.org Wed Jun 18 07:19:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:19:34 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v6] In-Reply-To: References: Message-ID: <3lJNOMkNydPpod5qe4WDihDBokdoNSgQR376FgYkeWc=.8d363894-80bd-4c55-9b76-257ea3cca35f@github.com> On Tue, 17 Jun 2025 12:53:48 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - more Thanks for the updates an improved documentation! I have a few more minor suggestions :) src/hotspot/share/opto/loopnode.cpp line 333: > 331: // > 332: // As loop optimizations transform the inner loop, the outer strip mined loop stays mostly unchanged. The only exception > 333: // is nodes referenced from the SafePoint and sunk from the inner loop: they end up in the outer strip mined loop. Do you want to reference `handle_sunk_stores_when_finishing_construction`? src/hotspot/share/opto/loopnode.cpp line 338: > 336: // to be mostly unaffected by the outer strip mined loop: the only extra step needed in most cases is to step over the > 337: // OuterStripMinedLoop. The main drawback is that once loop optimizations are over, an extra step is needed to finish > 338: // constructing the outer loop. This is handled by OuterStripMinedLoopNode::adjust_strip_mined_loop(). You should probably say explicitly which C2 IR rule we violate: whenever a memory slice is mutated in a loop, there needs to be a corresponding phi. src/hotspot/share/opto/loopnode.cpp line 3024: > 3022: // for each chain of sunk Stores for a particular memory slice. If some Stores were sunk and some left in the inner loop, > 3023: // a Phi was already created in the outer loop but its backedge input wasn't wired correctly to the last Store of the > 3024: // chain. Suggestion: // We're now in the process of finishing the construction of the outer loop. For each Phi in the inner loop, a Phi in // the outer loop was just now created. However, Sunk Stores cause an extra challenge: // 1) If all Stores in the inner loop were sunk for a particular memory slice, there's no Phi left for that memory // slice in the inner loop any more, and hence we did not yet add a Phi for the outer loop. So an extra Phi // must now be added for each chain of sunk Stores for a particular memory slice. // 2) If some Stores were sunk and some left in the inner loop, a Phi was already created in the outer loop but // its backedge input wasn't wired correctly to the last Store of the chain. We had wired the memory state at // the inner loop exit to the Phi backedge, but we should have taken the last Store of the chain instead. We // will now have to fix that too. src/hotspot/share/opto/loopnode.cpp line 3216: > 3214: } > 3215: > 3216: handle_sunk_stores_when_finishing_construction(igvn); Above, where you insert the phis, you may want to say something about the case of Sunk Stores as well. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2937907190 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2153762057 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2153765691 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2153826690 PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2153822355 From rcastanedalo at openjdk.org Wed Jun 18 07:19:34 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Jun 2025 07:19:34 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v6] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 12:53:48 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - more Re-running Oracle-internal testing... ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2981215994 From epeter at openjdk.org Wed Jun 18 07:19:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:19:35 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v6] In-Reply-To: <3lJNOMkNydPpod5qe4WDihDBokdoNSgQR376FgYkeWc=.8d363894-80bd-4c55-9b76-257ea3cca35f@github.com> References: <3lJNOMkNydPpod5qe4WDihDBokdoNSgQR376FgYkeWc=.8d363894-80bd-4c55-9b76-257ea3cca35f@github.com> Message-ID: On Wed, 18 Jun 2025 06:38:05 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - more > > src/hotspot/share/opto/loopnode.cpp line 338: > >> 336: // to be mostly unaffected by the outer strip mined loop: the only extra step needed in most cases is to step over the >> 337: // OuterStripMinedLoop. The main drawback is that once loop optimizations are over, an extra step is needed to finish >> 338: // constructing the outer loop. This is handled by OuterStripMinedLoopNode::adjust_strip_mined_loop(). > > You should probably say explicitly which C2 IR rule we violate: whenever a memory slice is mutated in a loop, there needs to be a corresponding phi. Suggestion: // Not adding Phis to the outer loop head from the beginning, and only adding them after loop optimizations // does not conform to C2's IR rules: any variable or memory slice that is mutated in a loop should have a Phi. // The main motivation for such a design that doesn't conform to C2's IR rules is to allow existing loop optimizations // to be mostly unaffected by the outer strip mined loop: the only extra step needed in most cases is to step over the // OuterStripMinedLoop. The main drawback is that once loop optimizations are over, an extra step is needed to finish // constructing the outer loop. This is handled by OuterStripMinedLoopNode::adjust_strip_mined_loop(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2153793259 From epeter at openjdk.org Wed Jun 18 07:19:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:19:36 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v4] In-Reply-To: References: <4pxoG8BH9QOLHb9O7XqofwEcrOXqSSLIbbunGEE5UYg=.17621f96-6eaa-476e-8043-cba003880f2b@github.com> <46PnJrx9oCCkTFd9tICMrsEXm_wb3dMaGQKT-xSv8Ec=.1046ecc3-9e7e-4fc2-a0bd-da2363dfed22@github.com> Message-ID: On Tue, 17 Jun 2025 12:49:14 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 3007: >> >>> 3005: #endif >>> 3006: >>> 3007: // Sunk stores are reachable from the memory state of the outer loop safepoint >> >> Is it true that the control of the safepoint is the `cle_exit_proj`? Could we add an assert for that? So we are just looking for all memory between those two control nodes? Or is it more complicated? > > Yes, it is. I added a call to `verify_strip_mined()` that checks the shape of the outer loop including the control flow nodes. Great, thanks :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25717#discussion_r2153830113 From jbhateja at openjdk.org Wed Jun 18 07:26:30 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Jun 2025 07:26:30 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> Message-ID: <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> On Wed, 11 Jun 2025 14:16:05 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> removing deoptimization for golden result computation > > Thanks for the improvements @jatin-bhateja nice progress :) Hi @eme64 , your comments have been addressed. Lets us know if its ok to land now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2983001978 From bmaillard at openjdk.org Wed Jun 18 07:29:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 18 Jun 2025 07:29:09 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v2] In-Reply-To: References: Message-ID: > This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) > - [x] tier1-3, plus some internal testing > - [x] Manual testing with values known to previously cause undefined behavior > > Thanks! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8356865: Add comment to clarify the flag range ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25834/files - new: https://git.openjdk.org/jdk/pull/25834/files/486ddcb6..c8904a29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25834&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25834&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25834/head:pull/25834 PR: https://git.openjdk.org/jdk/pull/25834 From bmaillard at openjdk.org Wed Jun 18 07:29:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 18 Jun 2025 07:29:09 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v2] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 08:36:39 GMT, Manuel H?ssig wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8356865: Add comment to clarify the flag range > > src/hotspot/share/runtime/globals.hpp line 1100: > >> 1098: /* Note: This value is zero mod 1<<13 for a cheap sparc set. */ \ >> 1099: "Inline allocations larger than this in doublewords must go slow")\ >> 1100: range(0, (1 << (BitsPerInt - LogBytesPerLong - 1)) - 1) \ > > It would be good to add a comment as to why this specific upper limit is necessary. Thanks, done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2153848744 From bmaillard at openjdk.org Wed Jun 18 07:38:33 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 18 Jun 2025 07:38:33 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v2] In-Reply-To: References: Message-ID: <8nXpApdLxXidwKfFpcVbKjpYgOn5EfhUvKNQRKvv2o0=.252bc291-3219-4d77-9a4d-8fe75952c2f6@github.com> On Tue, 17 Jun 2025 17:04:25 GMT, Dean Long wrote: >> src/hotspot/share/opto/graphKit.cpp line 3807: >> >>> 3805: int log2_esize = Klass::layout_helper_log2_element_size(layout_con); >>> 3806: fast_size_limit <<= (LogBytesPerLong - log2_esize); >>> 3807: assert (fast_size_limit > 0, "increasing the size limit should not produce negative values"); >> >> Prior C++14 left shit producing a negative value is undefined behavior: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2161.pdf >> >> Do we compile c++ source specifying the C++ standard? > > Yes we use -std=c++14, but creating a negative value in this way still feels like a kind of overflow to me. Thanks for the comments! I added the assert because the issue in the JBS mentioned a specific case where we ended up with negative values. Should I leave it like this, or rather convert it to a more specific check (ie. making sure that the `LogBytesPerLong - log2_esize` most significant bits are not used **before** shifting)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2153869915 From epeter at openjdk.org Wed Jun 18 07:40:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:40:30 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Wed, 18 Jun 2025 07:23:34 GMT, Jatin Bhateja wrote: >> Thanks for the improvements @jatin-bhateja nice progress :) > > Hi @eme64 , your comments have been addressed. Lets us know if its ok to land now. @jatin-bhateja The patch now looks good to me, nice work! ? I'll run some internal testing. However: I do not have hardware to thest Float16 on. So I'll rely on you to do thorough testing on relevant hardware, or alternatively SDE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2983041746 From epeter at openjdk.org Wed Jun 18 07:45:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:45:38 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v4] In-Reply-To: <1PL_T4eEjPTpI9lHg3nHmNzgxsXJpesG4Rxva4iZi60=.d73a5da9-704b-4bc7-8677-47e654a5c7c4@github.com> References: <1PL_T4eEjPTpI9lHg3nHmNzgxsXJpesG4Rxva4iZi60=.d73a5da9-704b-4bc7-8677-47e654a5c7c4@github.com> Message-ID: On Wed, 18 Jun 2025 04:02:36 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge, address code review > - Merge branch 'master' into vector-truncation > - Add assert for unexpected node in truncation > - Reformat, add comments and char tests > - Fix vector truncation with subword types src/hotspot/share/opto/superword.cpp line 2549: > 2547: > 2548: // For shorts and chars, check an additional set of nodes. > 2549: if (type->isa_int() == TypeInt::SHORT || type->isa_int() == TypeInt::CHAR) { What is this change for? src/hotspot/share/opto/superword.cpp line 2594: > 2592: // and then add it to the list of truncating nodes or the list of non-truncating ones just above. > 2593: // In product, we just return false, which is always correct. > 2594: assert(false, "Unexpected node in SuperWord truncation: %s", NodeClassNames[in->Opcode()]); Suggestion: // If this assert is hit, that means that we need to determine if the node can be safely truncated, // and then add it to the list of truncating nodes or the list of non-truncating ones just above. // In product, we just return false, which is always correct. assert(false, "Unexpected node in SuperWord truncation: %s", NodeClassNames[in->Opcode()]); Suggestion: // If this assert it hit, that means that we need to determine if the node can be safely truncated, // and then add it to the list of truncating nodes or the list of non-truncating ones just above. // In product, we just return false, which is always correct. assert(false, "Unexpected node in SuperWord truncation: %s", NodeClassNames[in->Opcode()]); Probably my bad previous suggestion is the culprit here ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2153885586 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2153880820 From xgong at openjdk.org Wed Jun 18 07:46:21 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 07:46:21 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'jdk:master' into JDK-8357726 Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f - Address reivew comments on IR test - Address review comments on jtreg and jmh tests - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25539/files - new: https://git.openjdk.org/jdk/pull/25539/files/08538543..7e04f5b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=02-03 Stats: 105150 lines in 1783 files changed: 65465 ins; 25468 del; 14217 mod Patch: https://git.openjdk.org/jdk/pull/25539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25539/head:pull/25539 PR: https://git.openjdk.org/jdk/pull/25539 From epeter at openjdk.org Wed Jun 18 07:47:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:47:35 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: <8KN8bTHLwatNL8hUzMmyenyoc1PKkJctQISBNVUEKv8=.41d1dc43-0838-4a8c-b6ff-9c9b3f0ce638@github.com> On Fri, 23 May 2025 22:43:35 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > renaming src/hotspot/share/opto/loopnode.hpp line 1485: > 1483: void remove_rf(Node* rf); > 1484: #ifdef ASSERT > 1485: bool has_redundant_rfs(Unique_Node_List& ignored_rfs, bool rf_only); I would prefer if all the method names spelled out `reachability_fences` instead of `rf / rfs`. src/hotspot/share/opto/macro.cpp line 983: > 981: _igvn._worklist.push(ac); > 982: } else if (use->is_ReachabilityFence() && OptimizeReachabilityFences) { > 983: _igvn.replace_input_of(use, 1, _igvn.makecon(TypePtr::NULL_PTR)); // reset; redundant fence Can you quickly explain in a code comment how this does a "reset"? What happens with it next? src/hotspot/share/opto/node.hpp line 701: > 699: DEFINE_CLASS_ID(MemBar, Multi, 3) > 700: DEFINE_CLASS_ID(Initialize, MemBar, 0) > 701: DEFINE_CLASS_ID(MemBarStoreStore, MemBar, 1) Suggestion: DEFINE_CLASS_ID(Initialize, MemBar, 0) DEFINE_CLASS_ID(MemBarStoreStore, MemBar, 1) I don't think you needed to touch the lines above, right? src/hotspot/share/opto/parse.hpp line 361: > 359: bool _wrote_fields; // Did we write any field? > 360: Node* _alloc_with_final_or_stable; // An allocation node with final or @Stable field > 361: Node* _stress_rf_hook; // StressReachabilityFences support You could write out the `rf` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2151855619 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2152601876 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2152636797 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2152638671 From epeter at openjdk.org Wed Jun 18 07:47:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 07:47:36 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: <8KN8bTHLwatNL8hUzMmyenyoc1PKkJctQISBNVUEKv8=.41d1dc43-0838-4a8c-b6ff-9c9b3f0ce638@github.com> References: <8KN8bTHLwatNL8hUzMmyenyoc1PKkJctQISBNVUEKv8=.41d1dc43-0838-4a8c-b6ff-9c9b3f0ce638@github.com> Message-ID: <93FqbSxL2lah-_ZMHgPe3NLNQ7nlILU7Id9hzCXBWaI=.15ac84dc-5795-43ce-bc7d-69ad7f2b8f9a@github.com> On Tue, 17 Jun 2025 09:58:05 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming > > src/hotspot/share/opto/loopnode.hpp line 1485: > >> 1483: void remove_rf(Node* rf); >> 1484: #ifdef ASSERT >> 1485: bool has_redundant_rfs(Unique_Node_List& ignored_rfs, bool rf_only); > > I would prefer if all the method names spelled out `reachability_fences` instead of `rf / rfs`. The arguments are less important for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2151856157 From bkilambi at openjdk.org Wed Jun 18 07:48:34 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 07:48:34 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: <856mxqvMplFj15Pi59PuVNq3pEYA-GDygT8MHUZoiz4=.30952418-dacf-4820-b88f-7256db109de9@github.com> References: <856mxqvMplFj15Pi59PuVNq3pEYA-GDygT8MHUZoiz4=.30952418-dacf-4820-b88f-7256db109de9@github.com> Message-ID: On Tue, 17 Jun 2025 08:21:31 GMT, Xiaohong Gong wrote: >> Hi @XiaohongGong , thanks for your prompt review. I can't use the negate condition here as it will include `T_DOUBLE` and `T_LONG` as well which are not supported. I'll change it to the same basic type check instead. > > Yeah, I noticed this as well. Since these types for NEON has been excluded in above function `match_rule_supported_vector()`. Can we just ignore such types? Besides, you have also added the type assertion in the macro assembler implementation. Ok, I can do that (although I hope it won't be ambiguous/confusing for someone who is just going through the match rules to understand the implementation). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2153894406 From bkilambi at openjdk.org Wed Jun 18 07:52:30 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 07:52:30 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 03:07:21 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and added a JTREG test > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 251: > >> 249: // false if vector length > 16B but supported SVE version < 2. >> 250: // For vector length of 16B, generate SVE2 "tbl" instruction if SVE2 is supported, else >> 251: // generate Neon "tbl" instruction to select from two vectors. > > So for <16B vectors, it will generate the NEON version even if UseSVE == 2, right? Since the implementation is complex for NEON's non-byte types, can we consider using the SVE2 version for such cases? Or did you compare the performance between different implementations for 64-bit species vectors? Yes for 64bit vectors, it generates Neon tbl for single vector lookup even for `UseSVE == 2`. We currently do not have an aarch64 machine with max vector length of 64bits. All aarch64 machines have atleast 128 bit and Neon enabled at the very least. So if we want to run with 64 bit vectors, either we can set `MaxVectorSize = 64` in the command line for auto-vectorization or use the 64Vector species for VectorAPI. It will use 64 bits vector (`d` reg) but the underlying vector is infact (at least) 128bit right (ex. Grace)? Then the SVE2 `tbl` does not have an 8B variant. It performs the lookup throughout the register (in this case `z` register). So the inputs will be loaded into a 64 bit register and `tbl` (SVE2) will be performed on the 128-bit (atleast) `z` register. That will lead to incorrect values? We may still have to move lower 64bit value from `src1` and `src2` into `tmp1` making it a full 128-bit reg and then generate the SVE `tbl` instruction for single vector lookup for `T_INT`, `T_SHORT` and `T_FLOAT` so that we can avoid the instructions that compute the offsets for each byte. This can be done for machines with SVE >= 1. On machines with SVE = 1 and MaxVectorSize > 16B, I think it should still work. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2153902686 From xgong at openjdk.org Wed Jun 18 07:52:31 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 07:52:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <856mxqvMplFj15Pi59PuVNq3pEYA-GDygT8MHUZoiz4=.30952418-dacf-4820-b88f-7256db109de9@github.com> Message-ID: On Wed, 18 Jun 2025 07:46:09 GMT, Bhavana Kilambi wrote: >> Yeah, I noticed this as well. Since these types for NEON has been excluded in above function `match_rule_supported_vector()`. Can we just ignore such types? Besides, you have also added the type assertion in the macro assembler implementation. > > Ok, I can do that (although I hope it won't be ambiguous/confusing for someone who is just going through the match rules to understand the implementation). Thanks! It's up to you. Current version just makes the predication look more complex and un-readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2153905481 From epeter at openjdk.org Wed Jun 18 08:05:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 08:05:35 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 22:43:35 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > renaming src/hotspot/share/opto/parse1.cpp line 379: > 377: _stress_rf_hook->add_req(loc); > 378: } > 379: } Can you add a short code comment describing what you are doing here, please? src/hotspot/share/opto/parse1.cpp line 394: > 392: _stress_rf_hook->add_req(stk); > 393: } > 394: } A short code comment would be helpful src/hotspot/share/opto/parse1.cpp line 2222: > 2220: > 2221: if (StressReachabilityFences) { > 2222: // Keep all oop arguments alive until method return. Why? Can you extend the comment a little? src/hotspot/share/opto/reachability.cpp line 44: > 42: * (0) initial set of RFs is materialized during parsing; > 43: * (1) optimization pass during loop opts which eliminates redundant nodes and > 44: * moves loop-invariant ones outside loops; Suggestion: * (1) optimization pass during loop opts which eliminates redundant nodes and * moves loop-invariant ones outside loops; I'd prever consistent indentation, but optional/question of taste src/hotspot/share/opto/reachability.cpp line 46: > 44: * moves loop-invariant ones outside loops; > 45: * (2) reachability information is transferred to safepoint nodes (appended as edges after debug info); > 46: * (3) reachability information from safepoints materialized as RF nodes attached to the safepoint node. Can you expand the explanation a little, please? I don't really understand. Why do you do this? What does it achieve? src/hotspot/share/opto/reachability.cpp line 51: > 49: * > 50: * It looks attractive to get rid of RF nodes early and transfer to safepoint-attached representation, > 51: * but it is not correct until loop opts are done. Why is it not correct? What could go wrong? Why is it safe to do it after loop opts? src/hotspot/share/opto/reachability.cpp line 55: > 53: * RF nodes may interfere with RA, so stand-alone RF nodes are eliminated and reachability information is > 54: * transferred to corresponding safepoints. When safepoints are pruned during macro expansion, corresponding > 55: * reachability info also goes away. Is it ok that this info goes away? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153895839 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153896761 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153906423 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153913766 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153924032 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153925167 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153927605 From epeter at openjdk.org Wed Jun 18 08:05:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 08:05:35 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 07:58:24 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming > > src/hotspot/share/opto/reachability.cpp line 46: > >> 44: * moves loop-invariant ones outside loops; >> 45: * (2) reachability information is transferred to safepoint nodes (appended as edges after debug info); >> 46: * (3) reachability information from safepoints materialized as RF nodes attached to the safepoint node. > > Can you expand the explanation a little, please? I don't really understand. Why do you do this? What does it achieve? It could be helpful if you wrote a paragraph (maybe at the top), about the interaction of SafePoint and ReachabilityFence. And you should also define "reachability information", I don't yet understand what that entails. > src/hotspot/share/opto/reachability.cpp line 55: > >> 53: * RF nodes may interfere with RA, so stand-alone RF nodes are eliminated and reachability information is >> 54: * transferred to corresponding safepoints. When safepoints are pruned during macro expansion, corresponding >> 55: * reachability info also goes away. > > Is it ok that this info goes away? Is this what could currently go wrong with your implementation, or what would go wrong if we only used Safepoints? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153933246 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2153929325 From epeter at openjdk.org Wed Jun 18 08:10:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 08:10:32 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> Message-ID: <1RkOZpprxf0iTWKQUVeW1qBKsNZnI7DNjEbTK7SUlwM=.21ff7a9c-e95a-47d0-bc76-2c50032e09e8@github.com> On Fri, 23 May 2025 04:42:08 GMT, Vladimir Ivanov wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> review feedback > >>> Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high. > >> How bad is it? MemBarCPUOrder pinches all memory, so I assume this breaks a lot of optimizations when RF is sitting in the hot loop? I remember we went through a similar exercise with Blackholes: [JDK-8296545](https://bugs.openjdk.org/browse/JDK-8296545) -- and decided to pinch only the control. I guessing this is not enough to fix RF, or is it? > > Yes, if a barrier stays inside loop body, it breaks a lot of important optimizations. It may end up almost as bad as a full-blown call (except a barrier can be moved around while a call can't). And moving a node when it depends both on control and memory is more complicated than just a CFG node. Moreover, as you can see in the proposed solution, even CFG-only representation is problematic for loop opts, so additional care is needed to ensure RFs are moved out of loops. > > As an alternative approach, I thought about reifying RF as a data node (think of `CastPP`) and then linking its referent to all safepoints it dominates after loop opts are over. But that would only affect `optimize_reachability_fences()`. Everything else would stay the same. So, I decided to stay with CFG-only representation for now. @iwanowww I reviewed half of the code. I left lots of minor comments, about code style and little code comments. They are not super important now. I think most important to work on now is the description on the top of `reachability.cpp`. I struggle to understand the basic concepts / ideas. It would be nice if you described things in more detail, and added some definitions. I think that would be a better basis for me to jump into the implementation afterward :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-2983147675 From xgong at openjdk.org Wed Jun 18 08:23:29 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 08:23:29 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 07:48:39 GMT, Bhavana Kilambi wrote: > Then the SVE2 tbl does not have an 8B variant. It performs the lookup throughout the register (in this case z register). So the inputs will be loaded into a 64 bit register and tbl (SVE2) will be performed on the 128-bit (atleast) z register. That will lead to incorrect values? It is an partial operation in SVE. But I think it's fine because we will generate a mask for the some IRs which may be influenced with unused higher lanes. And for most 64-bit operations, we choose NEON instructions which the vector size can be 64-bit anyway. > We may still have to move lower 64bit value from src1 and src2 into tmp1 making it a full 128-bit reg and then generate the SVE tbl instruction for single vector lookup for T_INT, T_SHORT and T_FLOAT so that we can avoid the instructions that compute the offsets for each byte. This can be done for machines with SVE >= 1. On machines with SVE = 1 and MaxVectorSize > 16B, I think it should still work. What do you think? This is a good idea. I just noticed that it would be a common issue for this op with partial vector size on SVE (vector_size < max_vector_size). It's not just for 64bits. Consider a vector type with 128bits, and the max vector size is 256bits, the result would be incorrect if using current SVE2 `tbl` instruction? The higher part is expected to be selected from the `src2`, but actually it may be from higher bits of `src1`, because the values in `index` would be inside the vector length of 256bits? Not sure whether I understand this op correctly. If it do exist this issue, maybe we should recognize such kind of partial IRs and implement it by merging `src1` and `src2`. The codegen will be much more complex. I just checked SVE `tbl`, and it is an unpredicated instruction, which is different from others. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2153969348 From xgong at openjdk.org Wed Jun 18 08:33:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 08:33:37 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 08:20:26 GMT, Xiaohong Gong wrote: >> Yes for 64bit vectors, it generates Neon tbl for single vector lookup even for `UseSVE == 2`. >> We currently do not have an aarch64 machine with max vector length of 64bits. All aarch64 machines have atleast 128 bit and Neon enabled at the very least. So if we want to run with 64 bit vectors, either we can set `MaxVectorSize = 64` in the command line for auto-vectorization or use the 64Vector species for VectorAPI. It will use 64 bits vector (`d` reg) but the underlying vector is infact (at least) 128bit right (ex. Grace)? >> Then the SVE2 `tbl` does not have an 8B variant. It performs the lookup throughout the register (in this case `z` register). So the inputs will be loaded into a 64 bit register and `tbl` (SVE2) will be performed on the 128-bit (atleast) `z` register. That will lead to incorrect values? >> We may still have to move lower 64bit value from `src1` and `src2` into `tmp1` making it a full 128-bit reg and then generate the SVE `tbl` instruction for single vector lookup for `T_INT`, `T_SHORT` and `T_FLOAT` so that we can avoid the instructions that compute the offsets for each byte. This can be done for machines with SVE >= 1. On machines with SVE = 1 and MaxVectorSize > 16B, I think it should still work. What do you think? > >> Then the SVE2 tbl does not have an 8B variant. It performs the lookup throughout the register (in this case z register). So the inputs will be loaded into a 64 bit register and tbl (SVE2) will be performed on the 128-bit (atleast) z register. That will lead to incorrect values? > > It is an partial operation in SVE. But I think it's fine because we will generate a mask for the some IRs which may be influenced with unused higher lanes. And for most 64-bit operations, we choose NEON instructions which the vector size can be 64-bit anyway. > >> We may still have to move lower 64bit value from src1 and src2 into tmp1 making it a full 128-bit reg and then generate the SVE tbl instruction for single vector lookup for T_INT, T_SHORT and T_FLOAT so that we can avoid the instructions that compute the offsets for each byte. This can be done for machines with SVE >= 1. On machines with SVE = 1 and MaxVectorSize > 16B, I think it should still work. What do you think? > > This is a good idea. > > I just noticed that it would be a common issue for this op with partial vector size on SVE (vector_size < max_vector_size). It's not just for 64bits. Consider a vector type with 128bits, and the max vector size is 256bits, the result would be incorrect if using current SVE2 `tbl` instruction? The higher part is expected to be selected from the `src2`, but actually it may be from higher bits of `src1`, because the values in `index` would be inside the vector length of 256bits? > > Not sure whether I understand this op correctly. If it do exist this issue, maybe we should recognize such kind of partial IRs and implement it by merging `src1` and `src2`. The codegen will be much more complex. I just checked SVE `tbl`, and it is an unpredicated instruction, which is different from others. To check whether it is an issue, you can use 64bits and 128bits as an example. And change to use SVE2's `tbl` for this op with 64bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2153991566 From mhaessig at openjdk.org Wed Jun 18 08:35:42 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 08:35:42 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v7] In-Reply-To: References: <2iyvUQeSbTQ0KsYs4qJKMDdlQ_IV9_v1_T-2dyaqcbs=.702b2d74-9767-467e-b8f9-ce04cfc91c08@github.com> Message-ID: On Tue, 17 Jun 2025 17:34:08 GMT, Roberto Casta?eda Lozano wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test flags > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 351: > >> 349: Node* dependant_lea = decode->fast_out(i); >> 350: if (dependant_lea->is_Mach() && dependant_lea->as_Mach()->ideal_Opcode() == Op_AddP) { >> 351: > > Nit: you could remove this empty line. Removed in bd35803. Thanks for pointing it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2153995826 From bkilambi at openjdk.org Wed Jun 18 08:41:29 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 08:41:29 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> On Wed, 18 Jun 2025 08:30:36 GMT, Xiaohong Gong wrote: > To check whether it is an issue, you can use 64bits and 128bits as an example. And change to use SVE2's tbl for this op with 64bits. Yes, I tried that and it does give incorrect results. It does an ldr into `d` regs but the tbl is on `z` regs. The values in the `index` are not a problem. They will be generated according to the vector length (in this case 64-bit) but since the `tbl` does a full register lookup (and not partial), it can return the higher bits of the `z` register as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154003610 From bkilambi at openjdk.org Wed Jun 18 08:41:30 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 08:41:30 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> Message-ID: On Wed, 18 Jun 2025 08:36:13 GMT, Bhavana Kilambi wrote: >> To check whether it is an issue, you can use 64bits and 128bits as an example. And change to use SVE2's `tbl` for this op with 64bits. > >> To check whether it is an issue, you can use 64bits and 128bits as an example. And change to use SVE2's tbl for this op with 64bits. > > Yes, I tried that and it does give incorrect results. > It does an ldr into `d` regs but the tbl is on `z` regs. The values in the `index` are not a problem. They will be generated according to the vector length (in this case 64-bit) but since the `tbl` does a full register lookup (and not partial), it can return the higher bits of the `z` register as well. > Consider a vector type with 128bits, and the max vector size is 256bits, the result would be incorrect if using current SVE2 tbl instruction? Yes, it would be incorrect. I tried to use SVE2 tbl instruction for 64-bit on a 128-bit machine and the results are incorrect. But we currently do not have an SVE2 machine with 256-bit to test our implementation on. > values in index would be inside the vector length of 256bits No, the index should contain values according to the vector length being used. I could see that for 64-bit Byte, it generates numbers between 0-15 and not 0-31 (for vector length of 128) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154009056 From thartmann at openjdk.org Wed Jun 18 08:47:30 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Jun 2025 08:47:30 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Tue, 17 Jun 2025 04:07:14 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. >> >> Without this fix, we see the following error for the C2 compiler tests below: >> >> compiler/vectorization/runner/ArrayTypeConvertTest.java >> compiler/intrinsics/zip/TestFpRegsABI.java >> >> >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 >> # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) >> # >> # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 >> # >> >> >> This PR fixes the errors in the above-mentioned tests. >> >> Currently, the ConvertF2I macro works as follows: >> >> >> vcvttss2si %xmm1,%eax >> cmp $0x80000000,%eax >> je STUB >> CONTINUE: >> >> STUB: >> sub $0x8,%rsp >> vmovss %xmm1,(%rsp) >> call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)} >> pop %rax >> jmp CONTINUE >> >> >> The maximum size (max_size) of the stub is precomputed as 23. However, as seen in the convertF2I_slowpath implementation (below), the usage of pop(dst) instruction increases the instruction encoding size by 1 byte if dst is an extended general-purpose register (R16-R31) . >> >> For example, `pop (r15)` is encoded as `41 5f`, whereas `pop(r21)` is encoded as `d5 10 5d`. >> >> >> >> >> static void convertF2I_slowpath(C2_MacroAssembler& masm, C2GeneralStub& stub) { >> #define __ masm. >> Register dst = stub.data<0>(); >> XMMRegister src = stub.data<1>(); >> address target = stub.data<2>(); >> __ bind(stub.entry()); >> __ subptr(rsp, 8); >> __ mo... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix the change in the stub size by 1 byte Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25787#pullrequestreview-2938303089 From xgong at openjdk.org Wed Jun 18 08:51:35 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 08:51:35 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> Message-ID: On Wed, 18 Jun 2025 08:38:55 GMT, Bhavana Kilambi wrote: > Yes, it would be incorrect. I tried to use SVE2 tbl instruction for 64-bit on a 128-bit machine and the results are incorrect. But we currently do not have an SVE2 machine with 256-bit to test our implementation on. It maybe possible in future, right? We can just use the 64bits and 128bits as an example to do the test. > No, the index should contain values according to the vector length being used. I could see that for 64-bit Byte, it generates numbers between 0-15 and not 0-31 (for vector length of 128) The valid index for a 64-bit byte should be 0-8, right? From the API level of `SelectFromTwoVector`, it seems the values inside of the `index` vector is in range of VLEN or 2 * VLEN ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154030712 From xgong at openjdk.org Wed Jun 18 08:55:30 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 08:55:30 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> Message-ID: On Wed, 18 Jun 2025 08:48:37 GMT, Xiaohong Gong wrote: >>> Consider a vector type with 128bits, and the max vector size is 256bits, the result would be incorrect if using current SVE2 tbl instruction? >> >> Yes, it would be incorrect. I tried to use SVE2 tbl instruction for 64-bit on a 128-bit machine and the results are incorrect. But we currently do not have an SVE2 machine with 256-bit to test our implementation on. >> >> >>> values in index would be inside the vector length of 256bits >> >> No, the index should contain values according to the vector length being used. I could see that for 64-bit Byte, it generates numbers between 0-15 and not 0-31 (for vector length of 128) > >> Yes, it would be incorrect. I tried to use SVE2 tbl instruction for 64-bit on a 128-bit machine and the results are incorrect. But we currently do not have an SVE2 machine with 256-bit to test our implementation on. > > It maybe possible in future, right? We can just use the 64bits and 128bits as an example to do the test. > >> No, the index should contain values according to the vector length being used. I could see that for 64-bit Byte, it generates numbers between 0-15 and not 0-31 (for vector length of 128) > > The valid index for a 64-bit byte should be 0-8, right? From the API level of `SelectFromTwoVector`, it seems the values inside of the `index` vector is in range of VLEN or 2 * VLEN ? Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154038247 From bkilambi at openjdk.org Wed Jun 18 09:02:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 09:02:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> Message-ID: On Wed, 18 Jun 2025 08:51:47 GMT, Xiaohong Gong wrote: >>> Yes, it would be incorrect. I tried to use SVE2 tbl instruction for 64-bit on a 128-bit machine and the results are incorrect. But we currently do not have an SVE2 machine with 256-bit to test our implementation on. >> >> It maybe possible in future, right? We can just use the 64bits and 128bits as an example to do the test. >> >>> No, the index should contain values according to the vector length being used. I could see that for 64-bit Byte, it generates numbers between 0-15 and not 0-31 (for vector length of 128) >> >> The valid index for a 64-bit byte should be 0-8, right? From the API level of `SelectFromTwoVector`, it seems the values inside of the `index` vector is in range of VLEN or 2 * VLEN ? > > Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? > The valid index for a 64-bit byte should be 0-8, right? From the API level of SelectFromTwoVector, it seems the values inside of the index vector is in range of VLEN or 2 * VLEN ? Yes, 0-7 for 64-bit Byte but this op is selecting from two vectors so the index should contain values in the range of [0, 2 * VLEN) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154048486 From bkilambi at openjdk.org Wed Jun 18 09:02:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 09:02:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> Message-ID: <44iknSSilWHvQqTaChPgdFcmnKf3xgrlam6ybLiUdSU=.04c8b881-c7c7-4f68-94ae-746b8d583675@github.com> On Wed, 18 Jun 2025 08:55:31 GMT, Bhavana Kilambi wrote: >> Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? > >> The valid index for a 64-bit byte should be 0-8, right? From the API level of SelectFromTwoVector, it seems the values inside of the index vector is in range of VLEN or 2 * VLEN ? > > Yes, 0-7 for 64-bit Byte but this op is selecting from two vectors so the index should contain values in the range of [0, 2 * VLEN) > Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154051232 From xgong at openjdk.org Wed Jun 18 09:02:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 09:02:59 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: <44iknSSilWHvQqTaChPgdFcmnKf3xgrlam6ybLiUdSU=.04c8b881-c7c7-4f68-94ae-746b8d583675@github.com> References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> <44iknSSilWHvQqTaChPgdFcmnKf3xgrlam6ybLiUdSU=.04c8b881-c7c7-4f68-94ae-746b8d583675@github.com> Message-ID: <3s-wkJqhURc_L_gnOqAJlmqXmVHV1B_e0C40mEysRss=.41e0a58d-8f49-44c9-a1a2-bd7e9fd5006c@github.com> On Wed, 18 Jun 2025 08:56:39 GMT, Bhavana Kilambi wrote: >>> The valid index for a 64-bit byte should be 0-8, right? From the API level of SelectFromTwoVector, it seems the values inside of the index vector is in range of VLEN or 2 * VLEN ? >> >> Yes, 0-7 for 64-bit Byte but this op is selecting from two vectors so the index should contain values in the range of [0, 2 * VLEN) > >> Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? > > This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. > Yes, 0-7 for 64-bit Byte but this op is selecting from two vectors so the index should contain values in the range of [0, 2 * VLEN) Yes, that's the problem. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154053214 From xgong at openjdk.org Wed Jun 18 09:03:00 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Jun 2025 09:03:00 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: <3s-wkJqhURc_L_gnOqAJlmqXmVHV1B_e0C40mEysRss=.41e0a58d-8f49-44c9-a1a2-bd7e9fd5006c@github.com> References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> <44iknSSilWHvQqTaChPgdFcmnKf3xgrlam6ybLiUdSU=.04c8b881-c7c7-4f68-94ae-746b8d583675@github.com> <3s-wkJqhURc_L_gnOqAJlmqXmVHV1B_e0C40mEysRss=.41e0a58d-8f49-44c9-a1a2-bd7e9fd5006c@github.com> Message-ID: On Wed, 18 Jun 2025 08:57:33 GMT, Xiaohong Gong wrote: >>> Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? >> >> This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. > >> Yes, 0-7 for 64-bit Byte but this op is selecting from two vectors so the index should contain values in the range of [0, 2 * VLEN) > > Yes, that's the problem. Thanks! > > Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? > > This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. Agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154053859 From bkilambi at openjdk.org Wed Jun 18 09:03:00 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 18 Jun 2025 09:03:00 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> <44iknSSilWHvQqTaChPgdFcmnKf3xgrlam6ybLiUdSU=.04c8b881-c7c7-4f68-94ae-746b8d583675@github.com> <3s-wkJqhURc_L_gnOqAJlmqXmVHV1B_e0C40mEysRss=.41e0a58d-8f49-44c9-a1a2-bd7e9fd5006c@github.com> Message-ID: On Wed, 18 Jun 2025 08:57:45 GMT, Xiaohong Gong wrote: >>> Yes, 0-7 for 64-bit Byte but this op is selecting from two vectors so the index should contain values in the range of [0, 2 * VLEN) >> >> Yes, that's the problem. Thanks! > >> > Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? >> >> This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. > > Agree. I will add the necessary code to enable/disable these scenarios. Thanks for your valuable insights into this @XiaohongGong ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154058041 From aph at openjdk.org Wed Jun 18 09:12:31 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Jun 2025 09:12:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: <9BVeq_H7Z-IsR6RwD_piXdaA-T_uSFm-vMqgv-0Vf_s=.effe2d1d-d180-4e57-9657-fbd615e4d7df@github.com> <44iknSSilWHvQqTaChPgdFcmnKf3xgrlam6ybLiUdSU=.04c8b881-c7c7-4f68-94ae-746b8d583675@github.com> <3s-wkJqhURc_L_gnOqAJlmqXmVHV1B_e0C40mEysRss=.41e0a58d-8f49-44c9-a1a2-bd7e9fd5006c@github.com> Message-ID: On Wed, 18 Jun 2025 08:59:25 GMT, Bhavana Kilambi wrote: >>> > Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? >>> >>> This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. >> >> Agree. > > I will add the necessary code to enable/disable these scenarios. Thanks for your valuable insights into this @XiaohongGong > > Or, maybe we can just disable/unsupport the cases for this op that the vector size is smaller than the max vector size and the max vector size >128-bits. And keep the 64bits as it is. We should add comment and revisit this part once 256bits SVE2 machine is supported in future. WDYT? > > This sounds better than adding complicated code which may not likely be executed in the near future. We can revisit when adequate support is added. I agree. Please treat it s a general rule that we do not ship code that cannot be tested. Also, quoth Dijkstra: ?Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.? It's not worth adding complexity to the compiler when the advantage is so rare (and the effect so small) that no one will ever significantly benefit from it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2154080977 From mhaessig at openjdk.org Wed Jun 18 09:19:55 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 09:19:55 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v8] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: - Simplify and explain scenarios - Simplify tests further - Remove empty line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/6f3cfcb7..f0e64687 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=06-07 Stats: 35 lines in 3 files changed: 0 ins; 30 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Wed Jun 18 09:19:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 09:19:56 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: <_0u23WPM6B2JmmIaS2VQe7iV5UtXglH-BWPhjvjX02A=.140ac707-df93-410b-b0a3-ad572ca3248a@github.com> References: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> <_0u23WPM6B2JmmIaS2VQe7iV5UtXglH-BWPhjvjX02A=.140ac707-df93-410b-b0a3-ad572ca3248a@github.com> Message-ID: <2Ox7or75eh6TIJw7FLBu3xy5wREZu8u06J4SX8S2dJg=.7fb40b2a-1128-4771-beec-7bfb286f954c@github.com> On Tue, 17 Jun 2025 17:46:59 GMT, Roberto Casta?eda Lozano wrote: >>> > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? >>> >>> It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. >> >> OK, that would be great! If you do not find one, I think the PR is still OK because it is easy to see how the peephole would handle the scenario. But it would be of course better to confirm it with an actual test case. > >> @robcasloz, I integrated all your suggestions and simplified the IR-tests. > > Thanks! > >> But I was unfortunately not able to create a test that reliably safepoints. > > Fair enough, thanks for trying. Thank you for your thorough reviews @robcasloz and @vnkozlov! I addressed all remaining comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2983370308 From mhaessig at openjdk.org Wed Jun 18 09:19:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 09:19:57 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v7] In-Reply-To: References: <2iyvUQeSbTQ0KsYs4qJKMDdlQ_IV9_v1_T-2dyaqcbs=.702b2d74-9767-467e-b8f9-ce04cfc91c08@github.com> Message-ID: On Tue, 17 Jun 2025 17:40:47 GMT, Roberto Casta?eda Lozano wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test flags > > test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 146: > >> 144: } >> 145: i += 1; >> 146: } > > Now that you only have one dimension, I think it is simpler to replace this loop with: > > scenarios[0] = new Scenario(0, "-XX:+IgnoreUnrecognizedVMOptions", "-XX:-OptoPeephole"); > scenarios[1] = new Scenario(1, "-XX:+IgnoreUnrecognizedVMOptions", "-XX:+OptoPeephole"); Good point. Fixed in f0e64687 > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 895: > >> 893: machOnly(LEA_P_32_NARROW, "leaP32Narrow"); >> 894: } >> 895: > > This rule is unused and could be removed. All `leaP*` rules except `LEA_P` should be unused. I simply missed simplifying one test case. I fixed this in 8197af5. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2154097596 PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2154098431 From roland at openjdk.org Wed Jun 18 09:24:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 18 Jun 2025 09:24:47 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: References: Message-ID: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/fa550f23..347cffd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=05-06 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From mhaessig at openjdk.org Wed Jun 18 09:25:43 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 09:25:43 GMT Subject: Withdrawn: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 13:10:34 GMT, Manuel H?ssig wrote: > # Issue Summary > > Running > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > on a machine with more than 255 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` and causes a check updated in #17244 to fail that should ensure that `NonNMethodCodeHeapSize` is at least `CodeCacheMinimumUseSpace` plus the size of all `CICompilerCount` compiler buffers. This is stronger than before #17244 where this check merely required `NonNMethodCodeHeapSize >= CodeCacheMinimumUseSpace` due to the fact that compiler buffers can also use the rest of the code cache if the non-nmethod heap is not sufficient. > > # Change Rationale > > This PR reverts the failing check to ensuring `NonNMethodCodeHeapSize >= CodeCacheMinimumUseSpace` since the computation of the ergonomic `CICompilerCount` in `CompilationPolicy::initialize()` does not support the assumption that all compiler buffers must always fit inside the non-nmethod code heap. > > This change required to adjust a test, because with the weaker check, it is currently not possible to trigger it from the commandline. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15681197877) > - [x] tier1 and tier2 plus some Oracle internal testing This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/25830 From amitkumar at openjdk.org Wed Jun 18 10:16:34 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 18 Jun 2025 10:16:34 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v2] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Wed, 11 Jun 2025 15:02:12 GMT, Damon Fenacci wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix >> - move the changes in flag constraints specific file > > I might be a bit picky (sorry for that) but since the flag was triggering a crash I was wondering if we could have a small regression test to make sure the VM never crashes (possibly checking the error as well). Thanks @dafedafe, @shipilev for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2983595519 From amitkumar at openjdk.org Wed Jun 18 10:16:35 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 18 Jun 2025 10:16:35 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v7] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: <3ScaLmhfsn03Y4xkbc0KmONgPrFTJSkbAXBHUnrAcXI=.05254ee2-2c74-4516-9e4d-e1b5fe4b643c@github.com> On Mon, 16 Jun 2025 11:45:09 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > remove string check & update copyright header Oh, sorry, I mistook Damon as reviewer. My mistake. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-2983599489 From epeter at openjdk.org Wed Jun 18 11:19:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 11:19:30 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 09:24:47 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Emanuel Peter @TobiHartmann Had 2 questions: - Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later? - Is this really a regression from [JDK-8281322](https://bugs.openjdk.org/browse/JDK-8281322)? If not, the affects version in JBS should be updated such that we'll consider this for backporting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2983777833 From shade at openjdk.org Wed Jun 18 11:31:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Jun 2025 11:31:28 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Tue, 17 Jun 2025 04:07:14 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. >> >> Without this fix, we see the following error for the C2 compiler tests below: >> >> compiler/vectorization/runner/ArrayTypeConvertTest.java >> compiler/intrinsics/zip/TestFpRegsABI.java >> >> >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 >> # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) >> # >> # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 >> # >> >> >> This PR fixes the errors in the above-mentioned tests. >> >> Currently, the ConvertF2I macro works as follows: >> >> >> vcvttss2si %xmm1,%eax >> cmp $0x80000000,%eax >> je STUB >> CONTINUE: >> >> STUB: >> sub $0x8,%rsp >> vmovss %xmm1,(%rsp) >> call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)} >> pop %rax >> jmp CONTINUE >> >> >> The maximum size (max_size) of the stub is precomputed as 23. However, as seen in the convertF2I_slowpath implementation (below), the usage of pop(dst) instruction increases the instruction encoding size by 1 byte if dst is an extended general-purpose register (R16-R31) . >> >> For example, `pop (r15)` is encoded as `41 5f`, whereas `pop(r21)` is encoded as `d5 10 5d`. >> >> >> >> >> static void convertF2I_slowpath(C2_MacroAssembler& masm, C2GeneralStub& stub) { >> #define __ masm. >> Register dst = stub.data<0>(); >> XMMRegister src = stub.data<1>(); >> address target = stub.data<2>(); >> __ bind(stub.entry()); >> __ subptr(rsp, 8); >> __ mo... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix the change in the stub size by 1 byte Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25787#pullrequestreview-2938833265 From rcastanedalo at openjdk.org Wed Jun 18 12:10:29 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Jun 2025 12:10:29 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v6] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 17:33:01 GMT, Roberto Casta?eda Lozano wrote: > Re-running Oracle-internal testing... Testing (commit fa550f23bf8bd6469e3e109060165f8b04d7d143 applied on top of jdk-26+2) passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2983935784 From rcastanedalo at openjdk.org Wed Jun 18 12:24:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Jun 2025 12:24:36 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v8] In-Reply-To: References: Message-ID: <-sIfkDUUle8cDr90vZkj1MZNpTyYEafXRBdeDGSjcz8=.07694331-6c06-4a54-acfc-807607657f73@github.com> On Wed, 18 Jun 2025 09:19:55 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Simplify and explain scenarios > - Simplify tests further > - Remove empty line Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2938986477 From jbhateja at openjdk.org Wed Jun 18 12:53:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Jun 2025 12:53:29 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: <_DRTA-1YMbiBv-976Fv-2_biPAr2zqPDnz2A-uUKnx0=.251fbfaf-cc53-4703-97b1-16a25b8e3b7b@github.com> On Tue, 17 Jun 2025 04:07:14 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. >> >> Without this fix, we see the following error for the C2 compiler tests below: >> >> compiler/vectorization/runner/ArrayTypeConvertTest.java >> compiler/intrinsics/zip/TestFpRegsABI.java >> >> >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 >> # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) >> # >> # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 >> # >> >> >> This PR fixes the errors in the above-mentioned tests. >> >> Currently, the ConvertF2I macro works as follows: >> >> >> vcvttss2si %xmm1,%eax >> cmp $0x80000000,%eax >> je STUB >> CONTINUE: >> >> STUB: >> sub $0x8,%rsp >> vmovss %xmm1,(%rsp) >> call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)} >> pop %rax >> jmp CONTINUE >> >> >> The maximum size (max_size) of the stub is precomputed as 23. However, as seen in the convertF2I_slowpath implementation (below), the usage of pop(dst) instruction increases the instruction encoding size by 1 byte if dst is an extended general-purpose register (R16-R31) . >> >> For example, `pop (r15)` is encoded as `41 5f`, whereas `pop(r21)` is encoded as `d5 10 5d`. >> >> >> >> >> static void convertF2I_slowpath(C2_MacroAssembler& masm, C2GeneralStub& stub) { >> #define __ masm. >> Register dst = stub.data<0>(); >> XMMRegister src = stub.data<1>(); >> address target = stub.data<2>(); >> __ bind(stub.entry()); >> __ subptr(rsp, 8); >> __ mo... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix the change in the stub size by 1 byte My testing passed. LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25787#pullrequestreview-2939079041 From roland at openjdk.org Wed Jun 18 13:11:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 18 Jun 2025 13:11:46 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v8] In-Reply-To: References: Message-ID: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25717/files - new: https://git.openjdk.org/jdk/pull/25717/files/347cffd6..90af607a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25717&range=06-07 Stats: 18 lines in 1 file changed: 6 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25717.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717 PR: https://git.openjdk.org/jdk/pull/25717 From roland at openjdk.org Wed Jun 18 13:11:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 18 Jun 2025 13:11:47 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v6] In-Reply-To: <3lJNOMkNydPpod5qe4WDihDBokdoNSgQR376FgYkeWc=.8d363894-80bd-4c55-9b76-257ea3cca35f@github.com> References: <3lJNOMkNydPpod5qe4WDihDBokdoNSgQR376FgYkeWc=.8d363894-80bd-4c55-9b76-257ea3cca35f@github.com> Message-ID: <-FQgy632Pq4KjdXxFQnaz1Vh4KrVTKRORqAW8OtztNw=.9805a90a-4993-44f3-baaf-891dc75ad77c@github.com> On Wed, 18 Jun 2025 07:16:49 GMT, Emanuel Peter wrote: > I have a few more minor suggestions :) New commit should cover your latest comments. Can you please have another look @eme64 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2984141100 From roland at openjdk.org Wed Jun 18 13:11:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 18 Jun 2025 13:11:47 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: References: Message-ID: <5Ux_6rUt9dtmICBrFavCIGGQp8shno1zYG7mbVtO5zs=.de8b7a55-a452-42df-a9e7-553addfb51a7@github.com> On Wed, 18 Jun 2025 11:17:03 GMT, Emanuel Peter wrote: > * Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later? Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild. > * Is this really a regression from [JDK-8281322](https://bugs.openjdk.org/browse/JDK-8281322)? If not, the affects version in JBS should be updated such that we'll consider this for backporting. I don't think it is. It's an issue that exists since loop strip mining exists AFAICT. I haven't tried how far back the test reproduces it though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2984149787 From syan at openjdk.org Wed Jun 18 13:32:05 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 18 Jun 2025 13:32:05 GMT Subject: RFR: 8359922: Incorrect comment for variable NMethodSizeLimit Message-ID: Hi all, This PR fix a incorrect for variable `NMethodSizeLimit`. The default value of `NMethodSizeLimit` is `64K * wordSize`. Trivial fix, no risk. ------------- Commit messages: - 8359922: Incorrect comment for variable NMethodSizeLimit Changes: https://git.openjdk.org/jdk/pull/25873/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25873&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359922 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25873.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25873/head:pull/25873 PR: https://git.openjdk.org/jdk/pull/25873 From thartmann at openjdk.org Wed Jun 18 14:15:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Jun 2025 14:15:29 GMT Subject: RFR: 8359922: Incorrect comment for variable NMethodSizeLimit In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 13:26:36 GMT, SendaoYan wrote: > Hi all, > > This PR fix a incorrect for variable `NMethodSizeLimit`. The [default value](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_globals.hpp#L279) of `NMethodSizeLimit` is `64K * wordSize`. > > Trivial fix, no risk. I think this flag will go away with [JDK-8358578](https://bugs.openjdk.org/browse/JDK-8358578), so it's not worth fixing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25873#issuecomment-2984391894 From mhaessig at openjdk.org Wed Jun 18 14:19:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 14:19:41 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce Message-ID: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Running java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. # Changes This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. # Testing - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) - [ ] tier1 through tier3 plus Oracle internal testing on our supported platforms ------------- Depends on: https://git.openjdk.org/jdk/pull/25791 Commit messages: - Calculate buffer size correctly for c2_only - Caclulate how many compiler buffers fit into NonNMethodCodeHeap - Clarify endif Changes: https://git.openjdk.org/jdk/pull/25872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354727 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25872/head:pull/25872 PR: https://git.openjdk.org/jdk/pull/25872 From sparasa at openjdk.org Wed Jun 18 14:51:32 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 18 Jun 2025 14:51:32 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Wed, 18 Jun 2025 08:45:00 GMT, Tobias Hartmann wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the change in the stub size by 1 byte > > Looks good to me. Thank you Tobias (@TobiHartmann), Aleksey (@shipilev) and Jatin (@jatin-bhateja) for approving the changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25787#issuecomment-2984520823 From duke at openjdk.org Wed Jun 18 14:51:33 2025 From: duke at openjdk.org (duke) Date: Wed, 18 Jun 2025 14:51:33 GMT Subject: RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used [v3] In-Reply-To: References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Tue, 17 Jun 2025 04:07:14 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. >> >> Without this fix, we see the following error for the C2 compiler tests below: >> >> compiler/vectorization/runner/ArrayTypeConvertTest.java >> compiler/intrinsics/zip/TestFpRegsABI.java >> >> >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 >> # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) >> # >> # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 >> # >> >> >> This PR fixes the errors in the above-mentioned tests. >> >> Currently, the ConvertF2I macro works as follows: >> >> >> vcvttss2si %xmm1,%eax >> cmp $0x80000000,%eax >> je STUB >> CONTINUE: >> >> STUB: >> sub $0x8,%rsp >> vmovss %xmm1,(%rsp) >> call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)} >> pop %rax >> jmp CONTINUE >> >> >> The maximum size (max_size) of the stub is precomputed as 23. However, as seen in the convertF2I_slowpath implementation (below), the usage of pop(dst) instruction increases the instruction encoding size by 1 byte if dst is an extended general-purpose register (R16-R31) . >> >> For example, `pop (r15)` is encoded as `41 5f`, whereas `pop(r21)` is encoded as `d5 10 5d`. >> >> >> >> >> static void convertF2I_slowpath(C2_MacroAssembler& masm, C2GeneralStub& stub) { >> #define __ masm. >> Register dst = stub.data<0>(); >> XMMRegister src = stub.data<1>(); >> address target = stub.data<2>(); >> __ bind(stub.entry()); >> __ subptr(rsp, 8); >> __ mo... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix the change in the stub size by 1 byte @vamsi-parasa Your change (at version 4fe56be6bf1845b305d76c78633780381450190f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25787#issuecomment-2984525959 From epeter at openjdk.org Wed Jun 18 14:51:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 14:51:38 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> On Wed, 11 Jun 2025 16:18:41 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. > > This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. > > Thanks, > Marc @marc-chevalier Nice work! I have a first set of qestions / suggestions :) src/hotspot/share/opto/callnode.cpp line 1306: > 1304: > 1305: //============================================================================= > 1306: bool CallLeafPureNode::is_unused() const { Can you add a quick comment why this check implies that the node is not used, i.e. what that means? src/hotspot/share/opto/callnode.cpp line 1309: > 1307: return proj_out_or_null(TypeFunc::Parms) == nullptr; > 1308: } > 1309: // We make a tuple of the global input state + TOP for the output values. Suggestion: } // We make a tuple of the global input state + TOP for the output values. Nit: there should be a newline between the methods. src/hotspot/share/opto/callnode.cpp line 1335: > 1333: if (can_reshape && is_unused()) { > 1334: return make_tuple_of_input_state_and_top_return_values(phase->C); > 1335: } Can you add a code comment what this does? src/hotspot/share/opto/compile.cpp line 3302: > 3300: break; > 3301: case Op_CallLeafPure: { > 3302: if (!Matcher::match_rule_supported(Op_CallLeafPure)) { Suggestion: // If the pure call is not supported, then lower to a CallLeaf. if (!Matcher::match_rule_supported(Op_CallLeafPure)) { src/hotspot/share/opto/compile.cpp line 3313: > 3311: for (unsigned int i = 0; i < call->tf()->domain()->cnt() - TypeFunc::Parms; i++) { > 3312: new_call->init_req(TypeFunc::Parms + i, call->in(TypeFunc::Parms + i)); > 3313: } The `TypeFunc::Parms` offsets are a bit confusing / unnecessary. Why not: Suggestion: // Copy all . for (unsigned int i = TypeFunc::Parms; i < call->tf()->domain()->cnt(); i++) { new_call->init_req(i, call->in(i)); } src/hotspot/share/opto/divnode.cpp line 1523: > 1521: return nullptr; > 1522: } > 1523: if (proj_out_or_null(TypeFunc::Control) == nullptr) { // dead node Interesting. You have extracted a method for `is_unused`, which you use multiple times. But you did not do it for this check which you also use multiple times. You could also wrap this in some method like `is_dead`. src/hotspot/share/opto/divnode.cpp line 1530: > 1528: > 1529: if (is_unused()) { > 1530: return make_tuple_of_input_state_and_top_return_values(igvn->C); What does this do? Add a short comment :) src/hotspot/share/opto/divnode.cpp line 1642: > 1640: } > 1641: assert(projs.catchall_ioproj == nullptr, "no exceptions from floating mod"); > 1642: assert(projs.catchall_catchproj == nullptr, "no exceptions from floating mod"); Why were you able to remove this? src/hotspot/share/opto/graphKit.cpp line 1916: > 1914: if (call->is_CallLeafPure()) { > 1915: return; > 1916: } Suggestion: if (call->is_CallLeafPure()) { // return; } src/hotspot/share/opto/multnode.cpp line 126: > 124: // Jumping over Tuples: the i-th projection of a Tuple is the i-th input of the Tuple. > 125: ctrl = ctrl->in(_con); > 126: } Do you need to special-case this here? Why does the `ProjNode::Identity` not suffice? Are there potentially other locations where we now would need this special logic? src/hotspot/share/opto/multnode.cpp line 177: > 175: } > 176: return this; > 177: } What would happen if we miss to do this optimization? I suppose we would have a Tuple left in the graph and get a bad AD file assert / deopt in product? src/hotspot/share/opto/multnode.hpp line 109: > 107: }; > 108: > 109: //------------------------------TupleNode--------------------------------------- These lines are rather unnecessary. More useful would be a description of what this Tuple is, and what it is used for. Are there any invariants about it? When do we expect them to appear or disappear? src/hotspot/share/opto/multnode.hpp line 113: > 111: const TypeTuple* _tf; > 112: > 113: template Does this need to be a template? Or would a type like `Node*` or `Node` suffice? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25760#pullrequestreview-2938357614 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154786291 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154059786 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154386828 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154770330 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154767905 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154784765 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154788329 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154790934 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154795672 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154804710 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154806581 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154808639 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154812833 From epeter at openjdk.org Wed Jun 18 14:51:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 14:51:38 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Wed, 18 Jun 2025 09:00:06 GMT, Emanuel Peter wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > src/hotspot/share/opto/callnode.cpp line 1309: > >> 1307: return proj_out_or_null(TypeFunc::Parms) == nullptr; >> 1308: } >> 1309: // We make a tuple of the global input state + TOP for the output values. > > Suggestion: > > } > > // We make a tuple of the global input state + TOP for the output values. > > Nit: there should be a newline between the methods. Also: the comment says exactly what the method name says ... I'd suggest deleting the comment or writing something with additional information. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154375205 From epeter at openjdk.org Wed Jun 18 14:51:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Jun 2025 14:51:39 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Wed, 18 Jun 2025 11:36:55 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/callnode.cpp line 1309: >> >>> 1307: return proj_out_or_null(TypeFunc::Parms) == nullptr; >>> 1308: } >>> 1309: // We make a tuple of the global input state + TOP for the output values. >> >> Suggestion: >> >> } >> >> // We make a tuple of the global input state + TOP for the output values. >> >> Nit: there should be a newline between the methods. > > Also: the comment says exactly what the method name says ... I'd suggest deleting the comment or writing something with additional information. For example: why do you put TOP for the return values? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2154376599 From mhaessig at openjdk.org Wed Jun 18 14:59:28 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 14:59:28 GMT Subject: RFR: 8359922: Incorrect comment for variable NMethodSizeLimit In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:12:36 GMT, Tobias Hartmann wrote: > I think this flag will go away with [JDK-8358578](https://bugs.openjdk.org/browse/JDK-8358578), so it's not worth fixing. Indeed, our timing is impeccable. I just opened #25876 to remove this flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25873#issuecomment-2984559340 From mhaessig at openjdk.org Wed Jun 18 15:00:04 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 15:00:04 GMT Subject: RFR: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 Message-ID: Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). # Testing - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms ------------- Commit messages: - Remove NMethodSizeLimit and make C1 code buffer size a constant Changes: https://git.openjdk.org/jdk/pull/25876/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25876&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358578 Stats: 56 lines in 6 files changed: 0 ins; 48 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25876.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25876/head:pull/25876 PR: https://git.openjdk.org/jdk/pull/25876 From dlunden at openjdk.org Wed Jun 18 16:19:31 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 18 Jun 2025 16:19:31 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v4] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 14:06:11 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > update to IGV README based on review comment Changes requested by dlunden (Committer). src/hotspot/share/opto/compile.cpp line 2535: > 2533: print_method(PHASE_BEFORE_MACRO_EXPANSION, 3); > 2534: PhaseMacroExpand mex(igvn); > 2535: // Do not allow new macro nodes once we started to expand Suggestion: // Do not allow new macro nodes once we start to eliminate and expand I know you just moved this comment, but i think it's worth clarifying while we're at it. src/hotspot/share/opto/macro.cpp line 2467: > 2465: } > 2466: } > 2467: #ifndef PRODUCT Suggestion: #ifndef PRODUCT test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java line 106: > 104: AFTER_MACRO_EXPANSION( "After Macro Expansion"), > 105: AFTER_MACRO_ELIMINATION_STEP( "After Macro Elimination Step"), > 106: AFTER_MACRO_ELIMINATION( "After Macro Elimination"), Suggestion: Move the two elimination phases before `BEFORE_MACRO_EXPANSION` for consistency with `phasetype.hpp` ------------- PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2939868711 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2155000680 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2155001766 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2155017333 From dlunden at openjdk.org Wed Jun 18 16:19:32 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 18 Jun 2025 16:19:32 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Tue, 17 Jun 2025 14:10:26 GMT, Saranya Natarajan wrote: >> src/hotspot/share/opto/macro.cpp line 2554: >> >>> 2552: bool PhaseMacroExpand::expand_macro_nodes() { >>> 2553: // Do not allow new macro nodes once we started to expand >>> 2554: C->reset_allow_macro_nodes(); >> >> Same here, this call is later compared to before (it is at the top of the old `expand_macro_nodes`, before `eliminate_macro_nodes`). Is this a safe move? > > I believe, this is also fine. However, I have moved it to `compile.cpp` to mimic the scenario as done previously before this PR. Good! Better safe than sorry, especially if there is no strong reason to move it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2155022384 From syan at openjdk.org Wed Jun 18 16:21:32 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 18 Jun 2025 16:21:32 GMT Subject: RFR: 8359922: Incorrect comment for variable NMethodSizeLimit In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:57:07 GMT, Manuel H?ssig wrote: > impeccable Okey, I will close this PR ------------- PR Comment: https://git.openjdk.org/jdk/pull/25873#issuecomment-2984897835 From syan at openjdk.org Wed Jun 18 16:21:33 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 18 Jun 2025 16:21:33 GMT Subject: Withdrawn: 8359922: Incorrect comment for variable NMethodSizeLimit In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 13:26:36 GMT, SendaoYan wrote: > Hi all, > > This PR fix a incorrect for variable `NMethodSizeLimit`. The [default value](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_globals.hpp#L279) of `NMethodSizeLimit` is `64K * wordSize`. > > Trivial fix, no risk. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/25873 From dlunden at openjdk.org Wed Jun 18 16:25:29 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 18 Jun 2025 16:25:29 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Tue, 17 Jun 2025 14:09:34 GMT, Saranya Natarajan wrote: >> src/hotspot/share/opto/macro.cpp line 2482: >> >>> 2480: return; >>> 2481: } >>> 2482: refine_strip_mined_loop_macro_nodes(); >> >> This call is later compared to before, right? In the previous version of `expand_macro_nodes`, it ran before the call to `eliminate_macro_nodes`. Is it safe to move it in this way? > > This placement of ` refine_strip_mined_loop_macro_nodes()` is ok as it only affects the functionality in the loop of the `eliminate_opaque_looplimit_macro_nodes` method. Could you elaborate a bit on why this is the case? Just looking briefly at `OuterStripMinedLoopNode::adjust_strip_mined_loop` (called from `refine_strip_mined_loop_macro_nodes`), I'm not convinced there are no other interactions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2155032375 From kvn at openjdk.org Wed Jun 18 16:34:28 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Jun 2025 16:34:28 GMT Subject: RFR: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:07:29 GMT, Manuel H?ssig wrote: > Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. > > `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. > > Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). > > # Testing > > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) > - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25876#pullrequestreview-2939957315 From cslucas at openjdk.org Wed Jun 18 16:46:34 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Jun 2025 16:46:34 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 20:05:20 GMT, Tom Rodriguez wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix some remaining renames. > > Sorry for the confusion on my part. The lack of a PR that's consuming these changes makes it harder to know which parts are the important ones. @tkrodriguez , @dougxc - are you OK with the latest changes I pushed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2984979703 From dnsimon at openjdk.org Wed Jun 18 16:56:36 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 18 Jun 2025 16:56:36 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v7] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 22:22:48 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Prevent overriding invalidation reason. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1408: > 1406: > 1407: C2V_VMENTRY(void, invalidateHotSpotNmethod, (JNIEnv* env, jobject, jobject hs_nmethod, jboolean deoptimize, jint invalidation_reason)) > 1408: #ifdef ASSERT We prefer runtime checks and throwing Java exceptions than assertions in this JVMCI code: int first = static_cast(nmethod::InvalidationReason::UNKNOWN); int last = static_cast(nmethod::InvalidationReason::LAST_REASON); if (invalidation_reason < first || invalidation_reason >= last) { JVMCI_THROW_MSG(IllegalArgumentException, err_msg("Invalid invalidation_reason: %d", invalidation_reason )); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2155092025 From kvn at openjdk.org Wed Jun 18 16:56:39 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Jun 2025 16:56:39 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v8] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 09:19:55 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Simplify and explain scenarios > - Simplify tests further > - Remove empty line src/hotspot/cpu/x86/peephole_x86_64.cpp line 359: > 357: > 358: // Remove spill for the decode if the spill node does not have any other uses. > 359: if (is_spill && decode_spill->outcnt() == 1 && block->contains(decode_spill)) { You can rearrange this to avoid forward declaration of `decode_spill`: if (is_spill) { MachNode* decode_spill = decode->in(1)->as_Mach(); If (decode_spill->outcnt() == 1 && block->contains(decode_spill)) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2155092759 From kvn at openjdk.org Wed Jun 18 16:59:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Jun 2025 16:59:37 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v8] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 16:54:15 GMT, Vladimir Kozlov wrote: >> Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: >> >> - Simplify and explain scenarios >> - Simplify tests further >> - Remove empty line > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 359: > >> 357: >> 358: // Remove spill for the decode if the spill node does not have any other uses. >> 359: if (is_spill && decode_spill->outcnt() == 1 && block->contains(decode_spill)) { > > You can rearrange this to avoid forward declaration of `decode_spill`: > > > if (is_spill) { > MachNode* decode_spill = decode->in(1)->as_Mach(); > If (decode_spill->outcnt() == 1 && block->contains(decode_spill)) { Or you concern that inputs of `decode` could be modified and you cached it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2155096769 From dnsimon at openjdk.org Wed Jun 18 16:59:33 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 18 Jun 2025 16:59:33 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v8] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 00:39:54 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra space Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25706#pullrequestreview-2940038030 From dnsimon at openjdk.org Wed Jun 18 16:59:34 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 18 Jun 2025 16:59:34 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 20:05:20 GMT, Tom Rodriguez wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix some remaining renames. > > Sorry for the confusion on my part. The lack of a PR that's consuming these changes makes it harder to know which parts are the important ones. > @tkrodriguez , @dougxc - are you OK with the latest changes I pushed? Modulo the comment I just made, these changes now look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2985042984 From cslucas at openjdk.org Wed Jun 18 17:00:56 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Jun 2025 17:00:56 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 20:21:34 GMT, Dhamoder Nalla wrote: >> This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations. >> >> >> **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.** >> >> **1. Initial State (Before Transformation)** >> The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node. >> >> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) >> >> **2. After Splitting Through Child Phi** >> The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input. >> >> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) >> >> **3. After Splitting Load Field Through Parent Phi** >> The field load operation (Load) is pushed even further up in the graph. >> >> Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging. >> >> This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement. >> >> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) >> >> ### JMH Benchmark Results: >> >> #### With Disabled RAM >> >> | Benchmark | Mode | Count | Score | Error | Units | >> |-----------|------|-------|-------|-------|-------| >> | testBailOut_runner | avgt | 15 | 13.969 | ? 0.248 | ms/op | >> | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ? 4.306 | ms/op | >> | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ? 1.781 | ms/op | >> | testMultiParentPhi_runner | avgt | 15 | 2.983 | ? 0.001 | ms/op | >> | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ? 0.731 | ms/op | >> | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ? 0.443 | ms/op | >> | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ? 1.212 | ms/op | >> | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ? 1.478 | ms/op | >> | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ? 0.306 | ms/op | >> | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ? 0.247 | ms/op | >> | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ? 4.869 ... > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > address CR comments test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 557: > 555: //-------------------------------------------------------------------------------------------------------------------------------------------- > 556: > 557: @DontCompile NIT: space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2155097300 From cslucas at openjdk.org Wed Jun 18 17:00:56 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Jun 2025 17:00:56 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: <15yeg5mhhgl_0k-ZvjRcrqUGtNaILSsgv2zmJ8L3MI4=.0fe56528-1cf6-4d3f-99a2-925399790ced@github.com> References: <15yeg5mhhgl_0k-ZvjRcrqUGtNaILSsgv2zmJ8L3MI4=.0fe56528-1cf6-4d3f-99a2-925399790ced@github.com> Message-ID: On Wed, 18 Jun 2025 06:10:29 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 563: >> >>> 561: @IR(counts = { IRNode.ALLOC, ">=3" }, >>> 562: phase = CompilePhase.PHASEIDEAL_BEFORE_EA, >>> 563: applyIfPlatform = {"64-bit", "true"}, >> >> Looks like all tests should be run on 64bit platform? If so perhaps you can just add the requirement at the top of this file, in the annotations section. > > @JohnTortugo I'd generally advise against restricting files to limited platforms. The tests can still run on other platforms, and possibly find bugs there. Limiting IR rules allows us to make strong assertions on limited platforms (e.g. asserting that some optimizations are done), while at least testing weaker properties (correctness) on all platforms. I thought the test would not be run if no annotation was applicable to the current platform. Anyway, if you think this is the best way, that's fine for me too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2155097006 From never at openjdk.org Wed Jun 18 17:11:31 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 18 Jun 2025 17:11:31 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v8] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 00:39:54 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra space src/hotspot/share/code/nmethod.hpp line 500: > 498: WHITEBOX_DEOPTIMIZATION, > 499: ZOMBIE, > 500: LAST_REASON This isn't really the last reason, since it's not actually a reason. So either `LAST_REASON = ZOMBIE` with adjustments to the range check or maybe `REASON_COUNT`? Why do we need `UNKNOWN` since it seems unused? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2155117063 From mhaessig at openjdk.org Wed Jun 18 17:17:50 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 17:17:50 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v9] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix forward declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/f0e64687..da3f007c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=07-08 Stats: 10 lines in 1 file changed: 3 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From mhaessig at openjdk.org Wed Jun 18 17:17:50 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 17:17:50 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v8] In-Reply-To: References: Message-ID: <-rEdw-HG1Ltxh7Y3a5t8vJpfGsFBIvImZoOo6jS2RF0=.668667b2-a707-4edd-9eab-30f9d816d613@github.com> On Wed, 18 Jun 2025 16:56:37 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/peephole_x86_64.cpp line 359: >> >>> 357: >>> 358: // Remove spill for the decode if the spill node does not have any other uses. >>> 359: if (is_spill && decode_spill->outcnt() == 1 && block->contains(decode_spill)) { >> >> You can rearrange this to avoid forward declaration of `decode_spill`: >> >> >> if (is_spill) { >> MachNode* decode_spill = decode->in(1)->as_Mach(); >> If (decode_spill->outcnt() == 1 && block->contains(decode_spill)) { > > Or you concern that inputs of `decode` could be modified and you cached it? Thank you for pointing this out. I was not trying to cache it. Reorganized in [da3f007](https://github.com/openjdk/jdk/pull/25471/commits/da3f007cb786b5a3efcdb155abcf73d8dbcee8af) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2155122423 From cslucas at openjdk.org Wed Jun 18 17:32:31 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Jun 2025 17:32:31 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v8] In-Reply-To: References: Message-ID: <0A26Svhs-4CaN398WfEGErGcuMuABx1u3p1fpN7sEM8=.48068df8-1da3-4d9c-93e5-3b5197f7407b@github.com> On Wed, 18 Jun 2025 17:09:12 GMT, Tom Rodriguez wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove extra space > > src/hotspot/share/code/nmethod.hpp line 500: > >> 498: WHITEBOX_DEOPTIMIZATION, >> 499: ZOMBIE, >> 500: LAST_REASON > > This isn't really the last reason, since it's not actually a reason. So either `LAST_REASON = ZOMBIE` with adjustments to the range check or maybe `REASON_COUNT`? Why do we need `UNKNOWN` since it seems unused? I'll looked other enums in HotSpot and there was a mixed of `*_COUNT`, `LAST_*`, etc. I opted to use `LAST_*` for no particular 'reason'. I'll change to `REASON_COUNT` as you suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2155150431 From cslucas at openjdk.org Wed Jun 18 17:32:32 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Jun 2025 17:32:32 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v7] In-Reply-To: References: Message-ID: <_qRG1S3LrM0U6mN9wNbHBPPiQTcy2i7BWwaxerbeOx8=.40b2352c-5844-41f9-a03c-3cb095cedc1b@github.com> On Wed, 18 Jun 2025 16:53:51 GMT, Doug Simon wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Prevent overriding invalidation reason. > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1408: > >> 1406: >> 1407: C2V_VMENTRY(void, invalidateHotSpotNmethod, (JNIEnv* env, jobject, jobject hs_nmethod, jboolean deoptimize, jint invalidation_reason)) >> 1408: #ifdef ASSERT > > We prefer runtime checks and throwing Java exceptions than assertions in this JVMCI code: > > int first = static_cast(nmethod::InvalidationReason::UNKNOWN); > int last = static_cast(nmethod::InvalidationReason::LAST_REASON); > if (invalidation_reason < first || invalidation_reason >= last) { > JVMCI_THROW_MSG(IllegalArgumentException, err_msg("Invalid invalidation_reason: %d", invalidation_reason )); > } Got it. I'll patch the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2155147173 From never at openjdk.org Wed Jun 18 18:03:37 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 18 Jun 2025 18:03:37 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v8] In-Reply-To: <0A26Svhs-4CaN398WfEGErGcuMuABx1u3p1fpN7sEM8=.48068df8-1da3-4d9c-93e5-3b5197f7407b@github.com> References: <0A26Svhs-4CaN398WfEGErGcuMuABx1u3p1fpN7sEM8=.48068df8-1da3-4d9c-93e5-3b5197f7407b@github.com> Message-ID: On Wed, 18 Jun 2025 17:29:45 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/code/nmethod.hpp line 500: >> >>> 498: WHITEBOX_DEOPTIMIZATION, >>> 499: ZOMBIE, >>> 500: LAST_REASON >> >> This isn't really the last reason, since it's not actually a reason. So either `LAST_REASON = ZOMBIE` with adjustments to the range check or maybe `REASON_COUNT`? Why do we need `UNKNOWN` since it seems unused? > > I'll looked other enums in HotSpot and there was a mixed of `*_COUNT`, `LAST_*`, etc. I opted to use `LAST_*` for no particular 'reason'. I'll change to `REASON_COUNT` as you suggest. Yes it's kind of a messy mix of idioms. The first/last pattern is usually for aliases for other values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25706#discussion_r2155199944 From mhaessig at openjdk.org Wed Jun 18 18:05:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 18:05:17 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v10] In-Reply-To: References: Message-ID: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix embarassing blunder ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25471/files - new: https://git.openjdk.org/jdk/pull/25471/files/da3f007c..3578277d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From kvn at openjdk.org Wed Jun 18 18:08:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Jun 2025 18:08:40 GMT Subject: [jdk25] RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string Message-ID: Hi all, This pull request contains a backport of commit [96070212](https://github.com/openjdk/jdk/commit/96070212adfd15acd99edf6e180db6228ee7b4ff) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Vladimir Kozlov on 17 Jun 2025 and was reviewed by Andrew Dinn and Ioi Lam. Thanks! ------------- Commit messages: - Backport 96070212adfd15acd99edf6e180db6228ee7b4ff Changes: https://git.openjdk.org/jdk/pull/25882/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25882&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359646 Stats: 15 lines in 1 file changed: 9 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25882.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25882/head:pull/25882 PR: https://git.openjdk.org/jdk/pull/25882 From kvn at openjdk.org Wed Jun 18 18:10:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Jun 2025 18:10:35 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v10] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 18:05:17 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix embarassing blunder Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2940225440 From mhaessig at openjdk.org Wed Jun 18 18:16:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 18:16:05 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v2] In-Reply-To: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: > Running > > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > > on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. > > This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) > - [ ] tier1 through tier3 plus Oracle internal testing on our supported platforms Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25872/files - new: https://git.openjdk.org/jdk/pull/25872/files/2cb40428..2cb40428 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25872/head:pull/25872 PR: https://git.openjdk.org/jdk/pull/25872 From mhaessig at openjdk.org Wed Jun 18 18:27:17 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 18 Jun 2025 18:27:17 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: > Running > > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > > on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. > > This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) > - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into JDK-8354727-policy - Calculate buffer size correctly for c2_only Co-authored-by: Aleksey Shipilev - Caclulate how many compiler buffers fit into NonNMethodCodeHeap - Clarify endif - update copyrights - remove leftover include - fix whitebox access to code cache size configs - VMPageSizeConstraintFunc - CodeCacheMinBlockLength - CodeCacheExpansionSize - ... and 4 more: https://git.openjdk.org/jdk/compare/7bc0d824...a05f9bd3 ------------- Changes: https://git.openjdk.org/jdk/pull/25872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=02 Stats: 23 lines in 3 files changed: 20 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25872/head:pull/25872 PR: https://git.openjdk.org/jdk/pull/25872 From sparasa at openjdk.org Wed Jun 18 18:32:36 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 18 Jun 2025 18:32:36 GMT Subject: Integrated: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> References: <5ou3DR0HW2MlRv79ufOutW0dpq29fGAgnFzWYDo6rhk=.33789d8d-4174-4a7e-aa0d-26603911f760@github.com> Message-ID: On Thu, 12 Jun 2025 19:41:01 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to fix the value of max_size of the C2CodeStub hardcoded in the C2_MacroAssembler::convertF2I() function when Intel APX instrucitons are used. Currently, max_size is hardcoded to 23 (introduced in [JDK-8306706](https://bugs.openjdk.org/browse/JDK-8306706)) . However, this value is incorrect when Intel APX instructions with extended general-purpose registers (EGPRs) are used in the code stub as using EGPRs with APX instructions leads to an increase in the instruction encoding size by additional 1 byte. > > Without this fix, we see the following error for the C2 compiler tests below: > > compiler/vectorization/runner/ArrayTypeConvertTest.java > compiler/intrinsics/zip/TestFpRegsABI.java > > > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/src/hotspot/share/opto/c2_CodeStubs.cpp:50), pid=3961123, tid=3961332 > # assert(max_size >= actual_size) failed: Expected stub size (23) must be larger than or equal to actual stub size (24) > # > # JRE version: OpenJDK Runtime Environment (26.0) (fastdebug build 26-internal-adhoc.parasa.jdkdemotion) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 26-internal-adhoc.parasa.jdkdemotion, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x955a77] C2CodeStubList::emit(C2_MacroAssembler&)+0x227 > # > > > This PR fixes the errors in the above-mentioned tests. > > Currently, the ConvertF2I macro works as follows: > > > vcvttss2si %xmm1,%eax > cmp $0x80000000,%eax > je STUB > CONTINUE: > > STUB: > sub $0x8,%rsp > vmovss %xmm1,(%rsp) > call Stub::f2i_fixup ; {runtime_call StubRoutines (initial stubs)} > pop %rax > jmp CONTINUE > > > The maximum size (max_size) of the stub is precomputed as 23. However, as seen in the convertF2I_slowpath implementation (below), the usage of pop(dst) instruction increases the instruction encoding size by 1 byte if dst is an extended general-purpose register (R16-R31) . > > For example, `pop (r15)` is encoded as `41 5f`, whereas `pop(r21)` is encoded as `d5 10 5d`. > > > > > static void convertF2I_slowpath(C2_MacroAssembler& masm, C2GeneralStub& stub) { > #define __ masm. > Register dst = stub.data<0>(); > XMMRegister src = stub.data<1>(); > address target = stub.data<2>(); > __ bind(stub.entry()); > __ subptr(rsp, 8); > __ movdbl(Address(rsp), src); > __ call(RuntimeAddress(target)); > __ pop(dst); // <-------- APX REX2 encoding for ... This pull request has now been integrated. Changeset: b52af182 Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/b52af182c43380186decd7e35625e42c7cafb8c2 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used Reviewed-by: thartmann, shade, jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25787 From eastigeevich at openjdk.org Wed Jun 18 19:44:27 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 18 Jun 2025 19:44:27 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v2] In-Reply-To: <8nXpApdLxXidwKfFpcVbKjpYgOn5EfhUvKNQRKvv2o0=.252bc291-3219-4d77-9a4d-8fe75952c2f6@github.com> References: <8nXpApdLxXidwKfFpcVbKjpYgOn5EfhUvKNQRKvv2o0=.252bc291-3219-4d77-9a4d-8fe75952c2f6@github.com> Message-ID: On Wed, 18 Jun 2025 07:35:47 GMT, Beno?t Maillard wrote: >> Yes we use -std=c++14, but creating a negative value in this way still feels like a kind of overflow to me. > > Thanks for the comments! > > I added the assert because the issue in the JBS mentioned a specific case where we ended up with negative values. > > Should I leave it like this, or rather convert it to a more specific check (ie. making sure that the `LogBytesPerLong - log2_esize` most significant bits are not used **before** shifting)? IMO your assert is obfuscating the overflow problem. I think the assert should be before doing the shift. It can be like: assert((fast_size_limit == 0) || (count_leading_zeros(fast_size_limit) > (LogBytesPerLong - log2_esize), "fast_size_limit (%d) overflow when shifted left by %d", fast_size_limit, (LogBytesPerLong - log2_esize)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2155369775 From never at openjdk.org Wed Jun 18 20:20:34 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 18 Jun 2025 20:20:34 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v8] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 00:39:54 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra space Module my comments about the enum it looks good to me. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25706#pullrequestreview-2940549880 From cslucas at openjdk.org Wed Jun 18 21:14:22 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Jun 2025 21:14:22 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v9] In-Reply-To: References: Message-ID: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Rename last invalidation placeholder. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25706/files - new: https://git.openjdk.org/jdk/pull/25706/files/68271c7a..fcfa0730 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25706&range=07-08 Stats: 10 lines in 3 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25706/head:pull/25706 PR: https://git.openjdk.org/jdk/pull/25706 From xgong at openjdk.org Thu Jun 19 01:34:42 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 19 Jun 2025 01:34:42 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 18 Jun 2025 07:46:21 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'jdk:master' into JDK-8357726 > > Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f > - Address reivew comments on IR test > - Address review comments on jtreg and jmh tests > - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times Ping again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2986243787 From syan at openjdk.org Thu Jun 19 02:07:51 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Jun 2025 02:07:51 GMT Subject: RFR: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:07:29 GMT, Manuel H?ssig wrote: > Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. > > `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. > > Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) > - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25876#pullrequestreview-2941272558 From syan at openjdk.org Thu Jun 19 03:24:26 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Jun 2025 03:24:26 GMT Subject: RFR: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache Message-ID: Hi all, Test test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java fails "Out of space in CodeCache" on machine which has huge CPU core number. This test will create lots of compile threads to stress test the compiler deoptimize, the thread number depends on the CPU core number, so on huge CPU core number machine this test will report "Out of space in CodeCache" failure. The "java.lang.OutOfMemoryError: Out of space in CodeCache" seems not the expected error, and increase the max code cache memory will make this test run pass steady. Could we change ReservedCodeCacheSize from 100m to 200m? Test-fix only, change has been verified on 256 core number linux-x86 machine and on 64 core number linux-x86 machine. ------------- Commit messages: - 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache Changes: https://git.openjdk.org/jdk/pull/25888/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25888&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359685 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25888/head:pull/25888 PR: https://git.openjdk.org/jdk/pull/25888 From jkarthikeyan at openjdk.org Thu Jun 19 04:08:11 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 19 Jun 2025 04:08:11 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v5] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Fix typo, remove TypeInt conversion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/ce16e2de..edce2a12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Thu Jun 19 04:08:15 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 19 Jun 2025 04:08:15 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v4] In-Reply-To: References: <1PL_T4eEjPTpI9lHg3nHmNzgxsXJpesG4Rxva4iZi60=.d73a5da9-704b-4bc7-8677-47e654a5c7c4@github.com> Message-ID: On Wed, 18 Jun 2025 07:42:08 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge, address code review >> - Merge branch 'master' into vector-truncation >> - Add assert for unexpected node in truncation >> - Reformat, add comments and char tests >> - Fix vector truncation with subword types > > src/hotspot/share/opto/superword.cpp line 2549: > >> 2547: >> 2548: // For shorts and chars, check an additional set of nodes. >> 2549: if (type->isa_int() == TypeInt::SHORT || type->isa_int() == TypeInt::CHAR) { > > What is this change for? Hmm, this is an interesting case. After I merged from master, I saw an error in my clangd linter that `const Type*` can't be compared with `const TypeInt*`, since `TypeInt::SHORT/CHAR` were changed from `Type*` to `TypeInt*` by JDK-8315066. Though, when compiling it looks like it still compiles fine, which makes sense since the types are related. I think this might just be a mistake with my linter, so I've pushed a revert of the change. > src/hotspot/share/opto/superword.cpp line 2594: > >> 2592: // and then add it to the list of truncating nodes or the list of non-truncating ones just above. >> 2593: // In product, we just return false, which is always correct. >> 2594: assert(false, "Unexpected node in SuperWord truncation: %s", NodeClassNames[in->Opcode()]); > > Suggestion: > > // If this assert is hit, that means that we need to determine if the node can be safely truncated, > // and then add it to the list of truncating nodes or the list of non-truncating ones just above. > // In product, we just return false, which is always correct. > assert(false, "Unexpected node in SuperWord truncation: %s", NodeClassNames[in->Opcode()]); > > Suggestion: > > // If this assert it hit, that means that we need to determine if the node can be safely truncated, > // and then add it to the list of truncating nodes or the list of non-truncating ones just above. > // In product, we just return false, which is always correct. > assert(false, "Unexpected node in SuperWord truncation: %s", NodeClassNames[in->Opcode()]); > > Probably my bad previous suggestion is the culprit here ? Ah whoops, I didn't catch that! I've pushed a typo fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2156051650 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2156049042 From jbhateja at openjdk.org Thu Jun 19 05:53:44 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 05:53:44 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Wed, 18 Jun 2025 07:23:34 GMT, Jatin Bhateja wrote: >> Thanks for the improvements @jatin-bhateja nice progress :) > > Hi @eme64 , your comments have been addressed. Lets us know if its ok to land now. > @jatin-bhateja The patch now looks good to me, nice work! ? I'll run some internal testing. > > However: I do not have hardware to thest Float16 on. So I'll rely on you to do thorough testing on relevant hardware, or alternatively SDE. Hi @eme64 , Tests on Float16 targets are clean, let us know the results of internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2986710956 From epeter at openjdk.org Thu Jun 19 06:12:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Jun 2025 06:12:31 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: <15yeg5mhhgl_0k-ZvjRcrqUGtNaILSsgv2zmJ8L3MI4=.0fe56528-1cf6-4d3f-99a2-925399790ced@github.com> Message-ID: On Wed, 18 Jun 2025 16:56:47 GMT, Cesar Soares Lucas wrote: >> @JohnTortugo I'd generally advise against restricting files to limited platforms. The tests can still run on other platforms, and possibly find bugs there. Limiting IR rules allows us to make strong assertions on limited platforms (e.g. asserting that some optimizations are done), while at least testing weaker properties (correctness) on all platforms. > > I thought the test would not be run if no annotation was applicable to the current platform. > Anyway, if you think this is the best way, that's fine for me too. @JohnTortugo This is a very important distinction: the IR `applyIf` only restricts the IR rule, not the running of the test itself ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2156198216 From chagedorn at openjdk.org Thu Jun 19 06:19:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 19 Jun 2025 06:19:36 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 18 Jun 2025 07:46:21 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'jdk:master' into JDK-8357726 > > Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f > - Address reivew comments on IR test > - Address review comments on jtreg and jmh tests > - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25539#pullrequestreview-2941712750 From epeter at openjdk.org Thu Jun 19 06:22:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Jun 2025 06:22:46 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Thu, 19 Jun 2025 05:50:52 GMT, Jatin Bhateja wrote: >> Hi @eme64 , your comments have been addressed. Lets us know if its ok to land now. > >> @jatin-bhateja The patch now looks good to me, nice work! ? I'll run some internal testing. >> >> However: I do not have hardware to thest Float16 on. So I'll rely on you to do thorough testing on relevant hardware, or alternatively SDE. > > Hi @eme64 , Tests on Float16 targets are clean, let us know the results of internal testing. @jatin-bhateja I see one test failing that looks related: `compiler/vectorization/TestFloat16VectorOperations.java` The run was done with lots of stress flags, probably not all are relevant, and it may be intermittent. `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1` # Internal Error (.../src/hotspot/share/opto/type.hpp:2234), pid=2401514, tid=2401533 # assert(_base == FloatCon) failed: Not a Float ... Current CompileTask: C2:1885 336 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddFloat16 @ 2 (46 bytes) Stack: [0x00007ff300480000,0x00007ff300580000], sp=0x00007ff30057aed0, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xc0358e] ConvF2HFNode::Ideal(PhaseGVN*, bool)+0x88e (type.hpp:2234) V [libjvm.so+0x181809d] PhaseIterGVN::transform_old(Node*)+0xbd (phaseX.cpp:668) V [libjvm.so+0x181c705] PhaseIterGVN::optimize()+0xc5 (phaseX.cpp:1054) V [libjvm.so+0xb498f2] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x722 (loopnode.hpp:1268) V [libjvm.so+0xb43630] Compile::Optimize()+0xb00 (compile.cpp:2468) V [libjvm.so+0xb46943] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1f33 (compile.cpp:868) V [libjvm.so+0x96bdd7] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x467 (c2compiler.cpp:141) V [libjvm.so+0xb55d78] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb58 (compileBroker.cpp:2323) V [libjvm.so+0xb56f48] CompileBroker::compiler_thread_loop()+0x578 (compileBroker.cpp:1967) V [libjvm.so+0x10aa00b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:773) V [libjvm.so+0x1b0e096] Thread::call_run()+0xb6 (thread.cpp:243) V [libjvm.so+0x17893f8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:868) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2986774921 From chagedorn at openjdk.org Thu Jun 19 06:39:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 19 Jun 2025 06:39:58 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 18 Jun 2025 07:46:21 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'jdk:master' into JDK-8357726 > > Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f > - Address reivew comments on IR test > - Address review comments on jtreg and jmh tests > - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times Let me submit some testing for it before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2986809051 From jbhateja at openjdk.org Thu Jun 19 06:57:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 06:57:51 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v11] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding a stricter constant type check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/c8ced549..2eaf28e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Thu Jun 19 06:57:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 06:57:51 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Thu, 19 Jun 2025 06:19:49 GMT, Emanuel Peter wrote: > V [libjvm.so+0xc0358e] ConvF2HFNode::Ideal(PhaseGVN*, bool)+0x88e (type.hpp:2234) Hi @eme64 , Thanks for reporting, I am still not able to reproduce, but I have added a stricter check. Also it looks like your stack trace is corrupted, as ConvF2HFNode::Ideal is part of convertnode.cpp , kindly re-check with the latest version. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2986850272 From epeter at openjdk.org Thu Jun 19 07:11:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Jun 2025 07:11:30 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v11] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Thu, 19 Jun 2025 06:57:51 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding a stricter constant type check src/hotspot/share/opto/convertnode.cpp line 285: > 283: if (conF != nullptr && > 284: varS != nullptr && > 285: conF->bottom_type()->isa_float_constant() && Yes, it might well have come from here, because `is_float_constant` asserts if it is not a float constant. But now you have an implicit null check, right? You should make it explicit, as per the guidelines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2156298450 From fyang at openjdk.org Thu Jun 19 07:18:49 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Jun 2025 07:18:49 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v3] In-Reply-To: References: Message-ID: <6_8ZfgnZJ58KTywbCHZoJbPhOkgRWq8VhWqnnEGLRMw=.3f3b1e62-2ebb-49a7-a2f0-754ecd3635c2@github.com> > Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. > > There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. > > And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. > > Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. > > Testing: > - [x] Tier1-3 on linux-x64 (release & fastdebug) > - [x] Tier1-3 on linux-aarch64 (release & fastdebug) > - [x] Tier1-3 on linux-riscv64 (release) > - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - improve test - Merge remote-tracking branch 'upstream/master' into JDK-8359270 - add test - Merge remote-tracking branch 'upstream/master' into JDK-8359270 - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25765/files - new: https://git.openjdk.org/jdk/pull/25765/files/581f587a..9dbc4ae9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=01-02 Stats: 8631 lines in 223 files changed: 2815 ins; 4533 del; 1283 mod Patch: https://git.openjdk.org/jdk/pull/25765.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25765/head:pull/25765 PR: https://git.openjdk.org/jdk/pull/25765 From thartmann at openjdk.org Thu Jun 19 07:18:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Jun 2025 07:18:50 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v2] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 03:36:33 GMT, Fei Yang wrote: >> Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. >> >> There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. >> >> And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. >> >> Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. >> >> Testing: >> - [x] Tier1-3 on linux-x64 (release & fastdebug) >> - [x] Tier1-3 on linux-aarch64 (release & fastdebug) >> - [x] Tier1-3 on linux-riscv64 (release) >> - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - add test > - Merge remote-tracking branch 'upstream/master' into JDK-8359270 > - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call `compiler/c2/irTests/stringopts/TestArrayCopySelect.java` fails on various platforms with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: Scenario #0 Compilations (2) of Failed Methods (2) -------------------------------------- 1) Compilation of "static char[] compiler.c2.irTests.stringopts.TestArrayCopySelect.testStrUGetCharsAligned(java.lang.String)": > Phase "PrintIdeal": AFTER: print_ideal 0 Root === 0 35 65 66 [[ 0 1 3 22 30 ]] inner 1 Con === 0 [[ ]] #top 3 Start === 3 0 [[ 3 5 6 7 8 9 10 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact *} 5 Parm === 3 [[ 27 ]] Control !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:-1 (line 71) 6 Parm === 3 [[ 31 47 ]] I_O !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:-1 (line 71) 7 Parm === 3 [[ 47 31 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:-1 (line 71) 8 Parm === 3 [[ 66 65 47 31 35 ]] FramePtr !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:-1 (line 71) 9 Parm === 3 [[ 66 65 31 ]] ReturnAdr !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:-1 (line 71) 10 Parm === 3 [[ 23 47 ]] Parm0: java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * Oop:java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:-1 (line 71) 22 ConP === 0 [[ 23 31 ]] #null 23 CmpP === _ 10 22 [[ 24 ]] !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 24 Bool === _ 23 [[ 27 ]] [ne] !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 27 If === 5 24 [[ 28 29 ]] P=0.999999, C=-1.000000 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 28 IfTrue === 27 [[ 47 ]] #1 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 29 IfFalse === 27 [[ 31 ]] #0 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 30 ConI === 0 [[ 31 ]] #int:-10 31 CallStaticJava === 29 6 7 8 9 (30 1 22 ) [[ 32 ]] # Static uncommon_trap(reason='null_check' action='maybe_recompile' debug_id='0') void ( int ) C=0.000100 TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 32 Proj === 31 [[ 35 ]] #0 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 35 Halt === 32 1 1 8 1 [[ 0 ]] !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 47 CallStaticJava === 28 6 7 8 1 (10 1 ) [[ 48 49 50 52 ]] # Static java.lang.String::toCharArray char[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact * ( java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact * ) TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 48 Proj === 47 [[ 54 ]] #0 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 49 Proj === 47 [[ 66 54 65 59 ]] #1 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 50 Proj === 47 [[ 66 65 ]] #2 Memory: @BotPTR *+bot, idx=Bot; !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 52 Proj === 47 [[ 65 ]] #5 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 54 Catch === 48 49 [[ 55 56 ]] !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 55 CatchProj === 54 [[ 65 ]] #0 at bci -1 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 56 CatchProj === 54 [[ 66 59 ]] #1 at bci -1 !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 59 CreateEx === 56 49 [[ 66 ]] #java/lang/Throwable (java/io/Serializable):NotNull * Oop:java/lang/Throwable (java/io/Serializable):NotNull * !jvms: TestArrayCopySelect::testStrUGetCharsAligned @ bci:1 (line 71) 65 Return === 55 49 50 8 9 returns 52 [[ 0 ]] 66 Rethrow === 56 49 50 8 9 exception 59 [[ 0 ]] 2) Compilation of "static java.lang.String compiler.c2.irTests.stringopts.TestArrayCopySelect.testStrUtoBytesAligned(char[])": > Phase "PrintIdeal": AFTER: print_ideal 0 Root === 0 37 38 [[ 0 1 3 ]] inner 1 Con === 0 [[ ]] #top 3 Start === 3 0 [[ 3 5 6 7 8 9 10 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:char[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact *} 5 Parm === 3 [[ 22 ]] Control !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:-1 (line 87) 6 Parm === 3 [[ 22 ]] I_O !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:-1 (line 87) 7 Parm === 3 [[ 22 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:-1 (line 87) 8 Parm === 3 [[ 38 37 22 ]] FramePtr !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:-1 (line 87) 9 Parm === 3 [[ 38 37 ]] ReturnAdr !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:-1 (line 87) 10 Parm === 3 [[ 22 ]] Parm0: char[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact * !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:-1 (line 87) 22 CallStaticJava === 5 6 7 8 1 (10 1 ) [[ 23 24 25 27 ]] # Static java.lang.String::valueOf java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * ( char[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact * ) TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 23 Proj === 22 [[ 29 ]] #0 !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 24 Proj === 22 [[ 38 29 37 34 ]] #1 !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 25 Proj === 22 [[ 38 37 ]] #2 Memory: @BotPTR *+bot, idx=Bot; !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 27 Proj === 22 [[ 37 ]] #5 Oop:java/lang/String (java/io/Serializable,java/lang/Comparable,java/lang/CharSequence,java/lang/constant/Constable,java/lang/constant/ConstantDesc):exact * !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 29 Catch === 23 24 [[ 30 31 ]] !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 30 CatchProj === 29 [[ 37 ]] #0 at bci -1 !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 31 CatchProj === 29 [[ 38 34 ]] #1 at bci -1 !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 34 CreateEx === 31 24 [[ 38 ]] #java/lang/Throwable (java/io/Serializable):NotNull * Oop:java/lang/Throwable (java/io/Serializable):NotNull * !jvms: TestArrayCopySelect::testStrUtoBytesAligned @ bci:1 (line 87) 37 Return === 30 24 25 8 9 returns 27 [[ 0 ]] 38 Rethrow === 31 24 25 8 9 exception 34 [[ 0 ]] Failed IR Rules (2) of Methods (2) ---------------------------------- 1) Method "static char[] compiler.c2.irTests.stringopts.TestArrayCopySelect.testStrUGetCharsAligned(java.lang.String)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#C#CALL_OF#_", "arrayof_jshort_disjoint_arraycopy", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"UseCompactObjectHeaders", "false"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(Call.*.*)+(\\s){2}===.*arrayof_jshort_disjoint_arraycopy )" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "static java.lang.String compiler.c2.irTests.stringopts.TestArrayCopySelect.testStrUtoBytesAligned(char[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#C#CALL_OF#_", "arrayof_jshort_disjoint_arraycopy", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"UseCompactObjectHeaders", "false"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(Call.*.*)+(\\s){2}===.*arrayof_jshort_disjoint_arraycopy )" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2983742441 From fyang at openjdk.org Thu Jun 19 07:19:42 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Jun 2025 07:19:42 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v2] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 11:04:41 GMT, Tobias Hartmann wrote: > `compiler/c2/irTests/stringopts/TestArrayCopySelect.java` fails on various platforms with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: Hi Tobias, I improved the test a bit and added some warmup to it. Now I see it is passing on three different platforms with these extra VM options. Could you please give it another try? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2986975409 From jbhateja at openjdk.org Thu Jun 19 07:20:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 07:20:53 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v11] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Thu, 19 Jun 2025 07:08:52 GMT, Emanuel Peter wrote: > But now you have an implicit null check, right? You should make it explicit, as per the guidelines. Not sure about above comment, which null check are to asking to be made explicit ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2156313922 From mhaessig at openjdk.org Thu Jun 19 07:34:46 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 07:34:46 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v4] In-Reply-To: <_0u23WPM6B2JmmIaS2VQe7iV5UtXglH-BWPhjvjX02A=.140ac707-df93-410b-b0a3-ad572ca3248a@github.com> References: <8af6oDH21I6Hfvdplgxq6EeH41bihWdwGwy7V8mtokE=.4a20d5f8-84a4-4533-bf5c-932051d1b2a1@github.com> <_0u23WPM6B2JmmIaS2VQe7iV5UtXglH-BWPhjvjX02A=.140ac707-df93-410b-b0a3-ad572ca3248a@github.com> Message-ID: On Tue, 17 Jun 2025 17:46:59 GMT, Roberto Casta?eda Lozano wrote: >>> > Is this scenario exercised by any of the new tests? If not, would it be possible to construct an additional test where we verify that the peephole is not applied in this case? >>> >>> It is not. I was only able to find such a case once with all VM intrinsics disabled some time ago, but was not able to reproduce one since. I'll have another try to find one. >> >> OK, that would be great! If you do not find one, I think the PR is still OK because it is easy to see how the peephole would handle the scenario. But it would be of course better to confirm it with an actual test case. > >> @robcasloz, I integrated all your suggestions and simplified the IR-tests. > > Thanks! > >> But I was unfortunately not able to create a test that reliably safepoints. > > Fair enough, thanks for trying. Thank you again for your reviews @robcasloz and @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2987031482 From duke at openjdk.org Thu Jun 19 07:34:47 2025 From: duke at openjdk.org (duke) Date: Thu, 19 Jun 2025 07:34:47 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v10] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 18:05:17 GMT, Manuel H?ssig wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix embarassing blunder @mhaessig Your change (at version 3578277d75e70bbdea7ee186b2afa624147ca1f8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25471#issuecomment-2987034967 From epeter at openjdk.org Thu Jun 19 07:35:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Jun 2025 07:35:11 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v11] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Thu, 19 Jun 2025 07:08:52 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding a stricter constant type check > > src/hotspot/share/opto/convertnode.cpp line 285: > >> 283: if (conF != nullptr && >> 284: varS != nullptr && >> 285: conF->bottom_type()->isa_float_constant() && > > Yes, it might well have come from here, because `is_float_constant` asserts if it is not a float constant. > > But now you have an implicit null check, right? You should make it explicit, as per the guidelines. `src/hotspot/share/opto/type.hpp:301:21: const TypeF *isa_float_constant() const; // Returns null if not a FloatCon` It returns a pointer, not a boolean. So you should write `conF->bottom_type()->isa_float_constant() != nullptr` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2156340988 From jbhateja at openjdk.org Thu Jun 19 07:44:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 07:44:39 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v12] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: <3Jz9cQCv4ic_z7EWVDlwebVJtglHmzdGNgLd2mqO8aU=.7c89de56-0eac-4680-9a45-21de67382d5f@github.com> > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/2eaf28e2..d0b63c5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Thu Jun 19 07:44:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 07:44:40 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v11] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Thu, 19 Jun 2025 07:31:39 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/convertnode.cpp line 285: >> >>> 283: if (conF != nullptr && >>> 284: varS != nullptr && >>> 285: conF->bottom_type()->isa_float_constant() && >> >> Yes, it might well have come from here, because `is_float_constant` asserts if it is not a float constant. >> >> But now you have an implicit null check, right? You should make it explicit, as per the guidelines. > > `src/hotspot/share/opto/type.hpp:301:21: const TypeF *isa_float_constant() const; // Returns null if not a FloatCon` > > It returns a pointer, not a boolean. So you should write `conF->bottom_type()->isa_float_constant() != nullptr` Kindly re-verify. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2156353660 From jbhateja at openjdk.org Thu Jun 19 07:50:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Jun 2025 07:50:22 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v13] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: <7riWgCQ_m74kYs-gbmKz15oaQ6qKM1ftjoiCdJSLYlo=.158be671-75b9-44d8-8992-2f1c9ff22c89@github.com> > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/subnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/d0b63c5a..428a50a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=11-12 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From epeter at openjdk.org Thu Jun 19 07:50:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Jun 2025 07:50:24 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v12] In-Reply-To: <3Jz9cQCv4ic_z7EWVDlwebVJtglHmzdGNgLd2mqO8aU=.7c89de56-0eac-4680-9a45-21de67382d5f@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <3Jz9cQCv4ic_z7EWVDlwebVJtglHmzdGNgLd2mqO8aU=.7c89de56-0eac-4680-9a45-21de67382d5f@github.com> Message-ID: On Thu, 19 Jun 2025 07:44:39 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolution src/hotspot/share/opto/subnode.cpp line 566: > 564: // applicable to other floating point types. > 565: if (t1->isa_half_float_constant() && > 566: t2->isa_half_float_constant()) { Suggestion: if (t1->isa_half_float_constant() != nullptr && t2->isa_half_float_constant() != nullptr) { `src/hotspot/share/opto/type.hpp:298:21: const TypeH *isa_half_float_constant() const; // Returns null if not a FloatCon` Same issue here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2156368153 From xgong at openjdk.org Thu Jun 19 07:53:41 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 19 Jun 2025 07:53:41 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: <527i6h9T52tQyrYRB3B5P5gVFMW2YbBRyZdMH_W4m0g=.0956d51c-798c-489e-96c1-5edccd0b34ef@github.com> On Thu, 19 Jun 2025 06:37:10 GMT, Christian Hagedorn wrote: > Let me submit some testing for it before integration. That's great! Thanks for your help! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2987096513 From dnsimon at openjdk.org Thu Jun 19 07:54:38 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 19 Jun 2025 07:54:38 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v9] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 21:14:22 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Rename last invalidation placeholder. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25706#pullrequestreview-2941973799 From kwei at openjdk.org Thu Jun 19 08:16:33 2025 From: kwei at openjdk.org (Kuai Wei) Date: Thu, 19 Jun 2025 08:16:33 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: On Tue, 17 Jun 2025 11:33:36 GMT, Emanuel Peter wrote: >> Hi @eme64 , as you guess, I think `collect_merge_info_list()` will be invoke multiple times for the same node. If a cache can be used, it can save much cost. >> >> My understanding about your idea is checking the successor `OrNode` to find the load address and shift offset, to see if it can be adjacent to current combine operator. >> >> I have difficult in such a case. For example, there are 8 `LoadByte` and they can be merged as a `LoadLong`. So there are 8 groups of merge info (load, combine, shift) . If current combine operator is in group 4, and the successor combine operator is in group 6. They are not adjacent. The pattern may be rare but it is still a valid graph. From the viewpoint of group 4 , I didn't know if they can be merged or not. I need continue to check output of the successor to see if we can find the missing part. And it may locate in the input chain of current combine operator. So I need check both direction ( input and output) of current combine operator. >> >> So my design is from the memory node and collect all mergeable group, and get the max merged group. > > @kuaiwei I see. If there are multiple groups, then things look more difficult. > > @merykitty Once proposed the idea of not doing MergeStores / MergeLoads as IGVN optimizations, but rather to just have a separate and dedicated phase. At the time, I was against it, because I had already implemented `MergeStores` quite far. But now I'm starting to see it as a possibly better alternative. > > That would allow you to take a global view, collect all loads (and stores), put them in a big list, and then make groups that belong together. And then see which groups could be legally replaced with a single load / store. In a way, that is a global vectorizer. And we could handle other patterns than just merging loads and stores: we could also merge copy patterns, for example. That could be much more powerful than the current approach. And it would avoid the issue with having to determine if the current node in IGVN is the best "candidate", or if we should look for another node further down. > > I don't know what you think about this complete "rethink" of the approach. But I do think it would be more powerful, and also avoid having to cache results during IGVN. All the "cached" results are local to that dedicated "MergeMemopsPhase" or whatever we would call it. > > What do you think? @eme64 , It sounds a good idea. I think a benefit is we can put 'merge memory' optimization before auto vectorization, so it can expose more chance for it. I'm not clear about copy pattern you mentioned. Can you give same example as reference? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2987150398 From mhaessig at openjdk.org Thu Jun 19 09:16:40 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 09:16:40 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Wed, 18 Jun 2025 18:27:17 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - VMPageSizeConstraintFunc > - CodeCacheMinBlockLength > - CodeCacheExpansionSize > - ... and 4 more: https://git.openjdk.org/jdk/compare/7bc0d824...a05f9bd3 The changes to the files `src/hotspot/share/runtime/flags/jvmFlagConstraintsRuntime.(c|h)pp` are artifacts from merge conflicts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-2987028518 From mhaessig at openjdk.org Thu Jun 19 09:34:22 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 09:34:22 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 13:38:03 GMT, Taizo Kurashige wrote: > This PR is improvement of warning message when fail to load hsdis library. > > [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. > > However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." > > To clear up this confusion, I suggest printing a warning just before [MachCode]. > >
> > sample output > > If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: > > . > . > native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 > 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 > . > . > > > If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout > > $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version > OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output > > ============================= C1-compiled nmethod ============================== > ----------------------------------- Assembly ----------------------------------- > > Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) > total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 > . > . > > [Constant Pool (empty)] > > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > [Instructions begin] > 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b > . > . > [Constant Pool (empty)] > > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > [Verified Entry Point] > # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte > . > . > > >
> > Since the warning added in this fix cover the role of warning introduced in [JDK-8287001](https://bugs.openjdk.org/browse/JDK-828... Thank you for working on this. I have been confused by this myself and think this is a great improvement. I do have a few comments and questions, though. Currently, I do not understand exactly how your new message is only printed when hsdis is not loaded. Do we only emit a MachCode section if hsdis is not loaded? src/hotspot/share/code/nmethod.cpp line 3493: > 3491: st->bol(); > 3492: st->cr(); > 3493: st->print_cr("Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section"); Some comments on the message: - I don't think we should use `[MachCode]` inside the square brackets apart from marking the start of a section. Otherwise, tools parsing a hs_err file incorrectly identify the start of the MachCode section. - Personally, I find the language of the message a bit clunky. Here are a few suggestions I would personally prefer: `Loading hsdis library failed, unable to show disassembled code in MachCode section` or `Note: Unable to display disassembled code because loading of hsdis library failed.`. src/hotspot/share/compiler/disassembler.cpp line 841: > 839: os::dll_lookup(_library, decode_instructions_virtual_name)); > 840: } else { > 841: log_warning(os)("Loading hsdis library failed"); Personally, I would leave this warning. It does not hurt, and perhaps someone is depending on it to detect if hsdis is installed correctly. ------------- Changes requested by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25726#pullrequestreview-2942190988 PR Review Comment: https://git.openjdk.org/jdk/pull/25726#discussion_r2156530756 PR Review Comment: https://git.openjdk.org/jdk/pull/25726#discussion_r2156518028 From mhaessig at openjdk.org Thu Jun 19 09:48:20 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 09:48:20 GMT Subject: Integrated: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Tue, 27 May 2025 17:26:59 GMT, Manuel H?ssig wrote: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... This pull request has now been integrated. Changeset: c7125aa2 Author: Manuel H?ssig Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/c7125aa2af43a339d401f8416a2251574f6de840 Stats: 668 lines in 6 files changed: 668 ins; 0 del; 0 mod 8020282: Generated code quality: redundant LEAs in the chained dereferences Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: kvn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/25471 From rcastanedalo at openjdk.org Thu Jun 19 09:54:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Jun 2025 09:54:37 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v8] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 13:11:46 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2942332101 From rcastanedalo at openjdk.org Thu Jun 19 10:01:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Jun 2025 10:01:59 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: <5Ux_6rUt9dtmICBrFavCIGGQp8shno1zYG7mbVtO5zs=.de8b7a55-a452-42df-a9e7-553addfb51a7@github.com> References: <5Ux_6rUt9dtmICBrFavCIGGQp8shno1zYG7mbVtO5zs=.de8b7a55-a452-42df-a9e7-553addfb51a7@github.com> Message-ID: On Wed, 18 Jun 2025 13:09:14 GMT, Roland Westrelin wrote: > > ``` > > * Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later? > > ``` > > Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild. In my opinion, the fix is quite local and contained, so the risk of causing regressions does not seem too high. There is also still quite some time left to observe and react to issues before the RC phase. I would vote for JDK 25. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2987491267 From qamai at openjdk.org Thu Jun 19 10:32:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 19 Jun 2025 10:32:44 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v5] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Mon, 16 Jun 2025 22:03:04 GMT, Srinivas Vamsi Parasa wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> comments describe C2GeneralStub > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4339: > >> 4337: } >> 4338: >> 4339: auto stub = C2CodeStub::make(dst, src, slowpath_target, 23, convertF2I_slowpath); > > Hi @merykitty, could you please explain how the size 23 was computed? This value does not work with APX and I created a PR (https://github.com/openjdk/jdk/pull/25787) for that. @vamsi-parasa Hi, I just manually assembled the snippet and see its size, for such a small snippet it is easy to see that the size is indeed the largest possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13602#discussion_r2156679194 From thartmann at openjdk.org Thu Jun 19 10:53:00 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Jun 2025 10:53:00 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v3] In-Reply-To: <6_8ZfgnZJ58KTywbCHZoJbPhOkgRWq8VhWqnnEGLRMw=.3f3b1e62-2ebb-49a7-a2f0-754ecd3635c2@github.com> References: <6_8ZfgnZJ58KTywbCHZoJbPhOkgRWq8VhWqnnEGLRMw=.3f3b1e62-2ebb-49a7-a2f0-754ecd3635c2@github.com> Message-ID: On Thu, 19 Jun 2025 07:18:49 GMT, Fei Yang wrote: >> Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. >> >> There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. >> >> And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. >> >> Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. >> >> Testing: >> - [x] Tier1-3 on linux-x64 (release & fastdebug) >> - [x] Tier1-3 on linux-aarch64 (release & fastdebug) >> - [x] Tier1-3 on linux-riscv64 (release) >> - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - improve test > - Merge remote-tracking branch 'upstream/master' into JDK-8359270 > - add test > - Merge remote-tracking branch 'upstream/master' into JDK-8359270 > - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call test/hotspot/jtreg/compiler/c2/irTests/stringopts/TestArrayCopySelect.java line 50: > 48: public static void main(String[] args) { > 49: TestFramework.runWithFlags("-XX:+UseCompactObjectHeaders", "-XX:-CompactStrings", "-XX:MaxInlineSize=70", "-XX:MinInlineFrequencyRatio=0"); > 50: TestFramework.runWithFlags("-XX:-UseCompactObjectHeaders", "-XX:-CompactStrings", "-XX:MaxInlineSize=70", "-XX:MinInlineFrequencyRatio=0"); Wouldn't it make more sense to simply enforce inlining via a compile command? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25765#discussion_r2156701312 From thartmann at openjdk.org Thu Jun 19 11:20:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Jun 2025 11:20:27 GMT Subject: RFR: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:07:29 GMT, Manuel H?ssig wrote: > Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. > > `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. > > Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) > - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25876#pullrequestreview-2942573116 From shade at openjdk.org Thu Jun 19 11:31:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Jun 2025 11:31:30 GMT Subject: [jdk25] RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 18:03:55 GMT, Vladimir Kozlov wrote: > Hi all, > > This pull request contains a backport of commit [96070212](https://github.com/openjdk/jdk/commit/96070212adfd15acd99edf6e180db6228ee7b4ff) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Vladimir Kozlov on 17 Jun 2025 and was reviewed by Andrew Dinn and Ioi Lam. > > Thanks! Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25882#pullrequestreview-2942587406 From thartmann at openjdk.org Thu Jun 19 11:31:31 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Jun 2025 11:31:31 GMT Subject: [jdk25] RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 18:03:55 GMT, Vladimir Kozlov wrote: > Hi all, > > This pull request contains a backport of commit [96070212](https://github.com/openjdk/jdk/commit/96070212adfd15acd99edf6e180db6228ee7b4ff) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Vladimir Kozlov on 17 Jun 2025 and was reviewed by Andrew Dinn and Ioi Lam. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25882#pullrequestreview-2942588254 From fyang at openjdk.org Thu Jun 19 11:36:27 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Jun 2025 11:36:27 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v4] In-Reply-To: References: Message-ID: > Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. > > There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. > > And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. > > Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. > > Testing: > - [x] Tier1-3 on linux-x64 (release & fastdebug) > - [x] Tier1-3 on linux-aarch64 (release & fastdebug) > - [x] Tier1-3 on linux-riscv64 (release) > - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25765/files - new: https://git.openjdk.org/jdk/pull/25765/files/9dbc4ae9..29ef3b2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25765&range=02-03 Stats: 13 lines in 1 file changed: 9 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25765.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25765/head:pull/25765 PR: https://git.openjdk.org/jdk/pull/25765 From fyang at openjdk.org Thu Jun 19 11:36:29 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Jun 2025 11:36:29 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v3] In-Reply-To: References: <6_8ZfgnZJ58KTywbCHZoJbPhOkgRWq8VhWqnnEGLRMw=.3f3b1e62-2ebb-49a7-a2f0-754ecd3635c2@github.com> Message-ID: On Thu, 19 Jun 2025 10:42:57 GMT, Tobias Hartmann wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - improve test >> - Merge remote-tracking branch 'upstream/master' into JDK-8359270 >> - add test >> - Merge remote-tracking branch 'upstream/master' into JDK-8359270 >> - 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call > > test/hotspot/jtreg/compiler/c2/irTests/stringopts/TestArrayCopySelect.java line 50: > >> 48: public static void main(String[] args) { >> 49: TestFramework.runWithFlags("-XX:+UseCompactObjectHeaders", "-XX:-CompactStrings", "-XX:MaxInlineSize=70", "-XX:MinInlineFrequencyRatio=0"); >> 50: TestFramework.runWithFlags("-XX:-UseCompactObjectHeaders", "-XX:-CompactStrings", "-XX:MaxInlineSize=70", "-XX:MinInlineFrequencyRatio=0"); > > Wouldn't it make more sense to simply enforce inlining via a compile command? Yes, that would be more explict about inlining. I updated the test accordingly. Still test good on these platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25765#discussion_r2156781132 From mhaessig at openjdk.org Thu Jun 19 12:25:31 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 12:25:31 GMT Subject: RFR: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 03:17:01 GMT, SendaoYan wrote: > Hi all, > > Test test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java fails "Out of space in CodeCache" on machine which has huge CPU core number. This test will create lots of compile threads to stress test the compiler deoptimize, the thread number depends on the CPU core number, so on huge CPU core number machine this test will report "Out of space in CodeCache" failure. > The "java.lang.OutOfMemoryError: Out of space in CodeCache" seems not the expected error, and increase the max code cache memory will make this test run pass steady. Could we change ReservedCodeCacheSize from 100m to 200m? > > Test-fix only, change has been verified on 256 core number linux-x86 machine and on 64 core number linux-x86 machine. This looks similar to #25872 which limits the number of compiler threads for high core counts. Would that perhaps already fix this issue or help reduce the amount of code cache needed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25888#issuecomment-2987367102 From syan at openjdk.org Thu Jun 19 12:25:32 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Jun 2025 12:25:32 GMT Subject: RFR: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 09:19:23 GMT, Manuel H?ssig wrote: > This looks similar to #25872 which limits the number of compiler threads for high core counts. Would that perhaps already fix this issue or help reduce the amount of code cache needed? Sorry, I didn't noticed that there is already have the same JBS issue. I think I can close this PR and the related JBS issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25888#issuecomment-2987897993 From mhaessig at openjdk.org Thu Jun 19 12:25:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 12:25:32 GMT Subject: RFR: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 12:19:49 GMT, SendaoYan wrote: > Sorry, I didn't noticed that there is already have the same JBS issue. I think I can close this PR and the related JBS issue I don't think it's the exact same issue, but the solution I implemented in the other PR might have an effect on this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25888#issuecomment-2987906300 From syan at openjdk.org Thu Jun 19 12:25:32 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Jun 2025 12:25:32 GMT Subject: Withdrawn: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 03:17:01 GMT, SendaoYan wrote: > Hi all, > > Test test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java fails "Out of space in CodeCache" on machine which has huge CPU core number. This test will create lots of compile threads to stress test the compiler deoptimize, the thread number depends on the CPU core number, so on huge CPU core number machine this test will report "Out of space in CodeCache" failure. > The "java.lang.OutOfMemoryError: Out of space in CodeCache" seems not the expected error, and increase the max code cache memory will make this test run pass steady. Could we change ReservedCodeCacheSize from 100m to 200m? > > Test-fix only, change has been verified on 256 core number linux-x86 machine and on 64 core number linux-x86 machine. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/25888 From syan at openjdk.org Thu Jun 19 12:34:33 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Jun 2025 12:34:33 GMT Subject: RFR: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 12:22:04 GMT, Manuel H?ssig wrote: > I don't think it's the exact same issue, but the solution I implemented in the other PR might have an effect on this one. I have verified your implemented manually, it did fix this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25888#issuecomment-2987936276 From epeter at openjdk.org Thu Jun 19 12:37:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Jun 2025 12:37:59 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> Message-ID: <1fggHGu5WfMRbBp6sUv_22QX7WW6EIizAYHmdNsRblA=.5b17e5ea-b4e8-4ae9-9129-24f0052e537f@github.com> On Thu, 19 Jun 2025 08:10:47 GMT, Kuai Wei wrote: >> @kuaiwei I see. If there are multiple groups, then things look more difficult. >> >> @merykitty Once proposed the idea of not doing MergeStores / MergeLoads as IGVN optimizations, but rather to just have a separate and dedicated phase. At the time, I was against it, because I had already implemented `MergeStores` quite far. But now I'm starting to see it as a possibly better alternative. >> >> That would allow you to take a global view, collect all loads (and stores), put them in a big list, and then make groups that belong together. And then see which groups could be legally replaced with a single load / store. In a way, that is a global vectorizer. And we could handle other patterns than just merging loads and stores: we could also merge copy patterns, for example. That could be much more powerful than the current approach. And it would avoid the issue with having to determine if the current node in IGVN is the best "candidate", or if we should look for another node further down. >> >> I don't know what you think about this complete "rethink" of the approach. But I do think it would be more powerful, and also avoid having to cache results during IGVN. All the "cached" results are local to that dedicated "MergeMemopsPhase" or whatever we would call it. >> >> What do you think? > > @eme64 , It sounds a good idea. I think a benefit is we can put 'merge memory' optimization before auto vectorization, so it can expose more chance for it. I'm not clear about copy pattern you mentioned. Can you give same example as reference? @kuaiwei I'm glad to hear you are for it too :) > I think a benefit is we can put 'merge memory' optimization before auto vectorization, so it can expose more chance for it. I'm not sure that is the best idea. I've tried that with MergeStores, and it had some bad effects for some "fill" loops: we would use MergeStores, and then SuperWord would work differently and fail some tests. You could investigate, but I'm not sure it is worth it. I would leave SuperWord to deal with loops, and use MergeMemory / MergeLoads / MergeStores to deal with the remaining code, be it in loops or straight line code. > I'm not clear about copy pattern you mentioned. Can you give same example as reference? a[0 + i] = b[0 + i] a[1 + i] = b[1 + i] Though that may require aliasing analysis in some cases, not sure if the complexity is worth it in general. Probably not. It is probably also not profitable to merge the copy pattern, and then to add a very expensisive aliasing check - you would lose more than you gain. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2987946996 From mhaessig at openjdk.org Thu Jun 19 12:51:30 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 12:51:30 GMT Subject: RFR: 8359685: Test stress/compiler/deoptimize/Test.java fails Out of space in CodeCache In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 03:17:01 GMT, SendaoYan wrote: > Hi all, > > Test test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java fails "Out of space in CodeCache" on machine which has huge CPU core number. This test will create lots of compile threads to stress test the compiler deoptimize, the thread number depends on the CPU core number, so on huge CPU core number machine this test will report "Out of space in CodeCache" failure. > The "java.lang.OutOfMemoryError: Out of space in CodeCache" seems not the expected error, and increase the max code cache memory will make this test run pass steady. Could we change ReservedCodeCacheSize from 100m to 200m? > > Test-fix only, change has been verified on 256 core number linux-x86 machine and on 64 core number linux-x86 machine. Oh sorry, I missed that. Good to hear, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25888#issuecomment-2987996408 From syan at openjdk.org Thu Jun 19 12:58:31 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Jun 2025 12:58:31 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Wed, 18 Jun 2025 18:27:17 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - VMPageSizeConstraintFunc > - CodeCacheMinBlockLength > - CodeCacheExpansionSize > - ... and 4 more: https://git.openjdk.org/jdk/compare/7bc0d824...a05f9bd3 This PR fix the failure https://bugs.openjdk.org/browse/JDK-8359685 in linux-x64 & (core number = 256) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-2988019394 From kvn at openjdk.org Thu Jun 19 13:43:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 19 Jun 2025 13:43:32 GMT Subject: [jdk25] RFR: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 11:24:06 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> This pull request contains a backport of commit [96070212](https://github.com/openjdk/jdk/commit/96070212adfd15acd99edf6e180db6228ee7b4ff) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Vladimir Kozlov on 17 Jun 2025 and was reviewed by Andrew Dinn and Ioi Lam. >> >> Thanks! > > Marked as reviewed by shade (Reviewer). Thank you @shipilev and @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25882#issuecomment-2988146702 From kvn at openjdk.org Thu Jun 19 13:43:33 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 19 Jun 2025 13:43:33 GMT Subject: [jdk25] Integrated: 8359646: C1 crash in AOTCodeAddressTable::add_C_string In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 18:03:55 GMT, Vladimir Kozlov wrote: > Hi all, > > This pull request contains a backport of commit [96070212](https://github.com/openjdk/jdk/commit/96070212adfd15acd99edf6e180db6228ee7b4ff) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Vladimir Kozlov on 17 Jun 2025 and was reviewed by Andrew Dinn and Ioi Lam. > > Thanks! This pull request has now been integrated. Changeset: e5ac75a3 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/e5ac75a35b20d952c2054525184c0d203592c156 Stats: 15 lines in 1 file changed: 9 ins; 4 del; 2 mod 8359646: C1 crash in AOTCodeAddressTable::add_C_string Reviewed-by: shade, thartmann Backport-of: 96070212adfd15acd99edf6e180db6228ee7b4ff ------------- PR: https://git.openjdk.org/jdk/pull/25882 From kvn at openjdk.org Thu Jun 19 14:13:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 19 Jun 2025 14:13:40 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Thu, 19 Jun 2025 07:24:01 GMT, Manuel H?ssig wrote: > The changes to the files `src/hotspot/share/runtime/flags/jvmFlagConstraintsRuntime.(c|h)pp` are artifacts from merge conflicts. This undid https://github.com/openjdk/jdk/commit/cf78925859dd2640b3c2500fc6be8b5bb308d96e changes. Something wrong was with merge. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-2988245089 From kvn at openjdk.org Thu Jun 19 14:26:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 19 Jun 2025 14:26:41 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: <9AscB7NkLLnT-U_PI820MdM_UVqa5YfQMtWFnoahQ_Q=.b407e72b-6bc2-4b29-a7ba-3c2d1f71d71e@github.com> On Wed, 18 Jun 2025 18:27:17 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - VMPageSizeConstraintFunc > - CodeCacheMinBlockLength > - CodeCacheExpansionSize > - ... and 4 more: https://git.openjdk.org/jdk/compare/7bc0d824...a05f9bd3 Changes looks fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-2988282569 From mhaessig at openjdk.org Thu Jun 19 14:41:27 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 14:41:27 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v4] In-Reply-To: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: > Running > > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > > on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. > > This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) > - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix merge conflict resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25872/files - new: https://git.openjdk.org/jdk/pull/25872/files/a05f9bd3..c6ab7dee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=02-03 Stats: 13 lines in 2 files changed: 0 ins; 13 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25872/head:pull/25872 PR: https://git.openjdk.org/jdk/pull/25872 From mhaessig at openjdk.org Thu Jun 19 14:41:43 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 14:41:43 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v3] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Wed, 18 Jun 2025 18:27:17 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - VMPageSizeConstraintFunc > - CodeCacheMinBlockLength > - CodeCacheExpansionSize > - ... and 4 more: https://git.openjdk.org/jdk/compare/7bc0d824...a05f9bd3 > This undid cf78925 changes. Something wrong was with merge. I fixed that in the latest commit. Thank you for pointing it out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-2988331468 From mhaessig at openjdk.org Thu Jun 19 15:04:06 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 15:04:06 GMT Subject: RFR: 8358572: C1 hits "need debug information" assert with -XX:-DeoptC1 Message-ID: The debug flag `DeoptC1` is required to be true for dependency recording by an assert, but not all uses of dependency recording in C1 are guarded with `if (DeoptC1)`. Hence, running `java -XX:-DeoptC1 -version`fails at the aforementioned assert. This error has been present unconditionally in debug builds since dependency recording was enabled outside of JVMTI in [JDK-8324241]([https://bugs.openjdk.org/browse/JDK-8324241) and at least since JDK7 with JVMTI. Because this issue was discovered by searching for crashes of `java -version` plus some other flag, which indicates this flag has not been used in at least one year since every invocation with `-XX:-DeoptC1`crashes. Further, `DeoptC1` is only used for guarding dependency recording in three places. Thus, this PR removes the `DeoptC1` flag. This was tested with: - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760179189) - [x] tier1, tier2 plus Oracle internal testing on Oracle supported platforms ------------- Commit messages: - Remove DeoptC1 debug flag Changes: https://git.openjdk.org/jdk/pull/25900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25900&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358572 Stats: 7 lines in 3 files changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25900/head:pull/25900 PR: https://git.openjdk.org/jdk/pull/25900 From mhaessig at openjdk.org Thu Jun 19 15:19:40 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 15:19:40 GMT Subject: RFR: 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags Message-ID: A run of `runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java` with an ubsan enabled binary revealed that passing the value 0 to `Tier(3|4)LoadFeedback`, and `TieredRateUpdateMinTime` lead to division by zero. Since `Tier(3|4)LoadFeedback` should disable the scaling of the compilation thresholds, 8bf37ee special cases the 0 case to disable scaling and documents it accordingly. 4893b28 sets the lower limit for `TieredRateUpdate(Min|Max)Time` to 1 since the code assumes that at least 1ms passes between each event: https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/compiler/compilationPolicy.cpp#L968-L974 This PR was tested with: - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760915006) - [x] tier1 and 2 plus Oracle internal testing on Oracle supported platforms ------------- Commit messages: - Increase lower bound of TieredRateUpdate(Min|Max)Time to 1ms - Disable compilation threshold scaling for Tier(3|4)LoadFeedback=0 Changes: https://git.openjdk.org/jdk/pull/25902/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25902&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353815 Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25902/head:pull/25902 PR: https://git.openjdk.org/jdk/pull/25902 From mhaessig at openjdk.org Thu Jun 19 15:20:31 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 19 Jun 2025 15:20:31 GMT Subject: RFR: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:07:29 GMT, Manuel H?ssig wrote: > Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. > > `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. > > Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) > - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25876#issuecomment-2988462423 From duke at openjdk.org Thu Jun 19 15:20:31 2025 From: duke at openjdk.org (duke) Date: Thu, 19 Jun 2025 15:20:31 GMT Subject: RFR: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:07:29 GMT, Manuel H?ssig wrote: > Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. > > `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. > > Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) > - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms @mhaessig Your change (at version 66b7ba3c003d8e3fcd1796c2e2766becf655bb82) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25876#issuecomment-2988467131 From never at openjdk.org Thu Jun 19 15:48:31 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 19 Jun 2025 15:48:31 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v9] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 21:14:22 GMT, Cesar Soares Lucas wrote: >> We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). >> >> This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 >> ). >> >> Tested on Linux x86_64, ARM with JTREG tier 1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Rename last invalidation placeholder. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25706#pullrequestreview-2943496440 From kvn at openjdk.org Thu Jun 19 15:49:30 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 19 Jun 2025 15:49:30 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v4] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: <2xlrHaDzp-sAGkW5tP7rlBELvxEFxpv8v8O9_XRznNk=.1db2902f-c220-43c3-8958-bcf5b6fc1021@github.com> On Thu, 19 Jun 2025 14:41:27 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix merge conflict resolution Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25872#pullrequestreview-2943499388 From chagedorn at openjdk.org Thu Jun 19 16:31:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 19 Jun 2025 16:31:34 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 18 Jun 2025 07:46:21 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'jdk:master' into JDK-8357726 > > Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f > - Address reivew comments on IR test > - Address review comments on jtreg and jmh tests > - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-2988643837 From chagedorn at openjdk.org Thu Jun 19 16:33:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 19 Jun 2025 16:33:29 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 17:33:55 GMT, Kangcheng Xu wrote: >> Was out last week but I'm seeing your last commit mentions WIP. Let me know when it's ready to have another look again :-) > > Resolved conflict with [JDK-8357951](https://bugs.openjdk.org/browse/JDK-8357951). @chhagedorn I'd appreciate a re-review. Thank you so much! Thanks @tabjy for coming back with an update and pinging me again! Sorry, I completely missed it the first time. I will be on vacation starting tomorrow for two weeks but I'm happy to take another look when I'm back :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-2988648724 From cslucas at openjdk.org Thu Jun 19 18:05:36 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 19 Jun 2025 18:05:36 GMT Subject: Integrated: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client In-Reply-To: References: Message-ID: <-FhUFx2PeaAm7XTgKRj_GXQ82kzfhV2e4FvtRT3mP-s=.220619a8-7730-41b6-b156-242f81243720@github.com> On Mon, 9 Jun 2025 23:57:46 GMT, Cesar Soares Lucas wrote: > We recently introduced a way to set the reason why a nmethod was being marked as `not entrant`, see [here](https://github.com/openjdk/jdk/pull/23980) and [here](https://github.com/openjdk/jdk/pull/25338). > > This PR is to expose in the JVMCI interface the reason why the nmethod was flagged as `not entrant`. This will allow JVMCI-based compilers to implement heuristics to handle re-compilations differently based on what happened to earlier versions of a method, for instance, this will likely be used to address this [RFE in Truffle](https://github.com/oracle/graal/issues/11045 > ). > > Tested on Linux x86_64, ARM with JTREG tier 1-3. This pull request has now been integrated. Changeset: 2fe12984 Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/2fe12984474656a08c4525c04a351d85be73f658 Stats: 278 lines in 25 files changed: 177 ins; 7 del; 94 mod 8359064: Expose reason for marking nmethod non-entrant to JVMCI client Reviewed-by: dnsimon, never ------------- PR: https://git.openjdk.org/jdk/pull/25706 From cslucas at openjdk.org Thu Jun 19 18:15:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 19 Jun 2025 18:15:33 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v9] In-Reply-To: References: <15yeg5mhhgl_0k-ZvjRcrqUGtNaILSsgv2zmJ8L3MI4=.0fe56528-1cf6-4d3f-99a2-925399790ced@github.com> Message-ID: On Thu, 19 Jun 2025 06:09:47 GMT, Emanuel Peter wrote: >> I thought the test would not be run if no annotation was applicable to the current platform. >> Anyway, if you think this is the best way, that's fine for me too. > > @JohnTortugo This is a very important distinction: the IR `applyIf` only restricts the IR rule, not the running of the test itself ;) As I said, I see your point: you want to run the test not to verify the IR, but to compare the output of the method when running in interpreted and in compiled mode. but this is still arguably confusing. That being said, my confusion was that, AFAIU, the "IR" tests are for testing IR patterns. If there is no IR validation on a "IR test" method so you'd imagine this is not really an "IR test", its a "regular JTREG" test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2157481723 From duke at openjdk.org Thu Jun 19 21:41:21 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 21:41:21 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch Message-ID: [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove ------------- Commit messages: - 8358655: AArch64: Simplify Interpreter::profile_taken_branch Changes: https://git.openjdk.org/jdk/pull/25906/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25906&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358655 Stats: 22 lines in 3 files changed: 0 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25906/head:pull/25906 PR: https://git.openjdk.org/jdk/pull/25906 From duke at openjdk.org Thu Jun 19 21:46:56 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 21:46:56 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v30] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Fix pointer printing - Use set_destination_mt_safe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/50f0edb3..292ab74f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=28-29 Stats: 35 lines in 6 files changed: 10 ins; 16 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Jun 19 22:06:31 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 22:06:31 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v31] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Fix pointer printing - Use set_destination_mt_safe - Print address as pointer - Use new _metadata_size instead of _jvmci_data_size - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Only check branch distance for aarch64 and riscv - Move far branch fix to fix_relocation_after_move - Move far branch fix to fix_relocation_after_move - Add test to verify JVMTI events during nmethod relocation - ... and 78 more: https://git.openjdk.org/jdk/compare/0dd50dbb...e51a1a09 ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=30 Stats: 1604 lines in 25 files changed: 1563 ins; 2 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu Jun 19 22:09:37 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 22:09:37 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v28] In-Reply-To: <9nobEwalgtW3at_rR9OV1Y-rZdHxL0DJ0-UWCt0wA10=.eb7bb18c-7d1f-458c-b377-f9d348a38155@github.com> References: <9nobEwalgtW3at_rR9OV1Y-rZdHxL0DJ0-UWCt0wA10=.eb7bb18c-7d1f-458c-b377-f9d348a38155@github.com> Message-ID: On Wed, 18 Jun 2025 00:52:08 GMT, Fei Yang wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Use new _metadata_size instead of _jvmci_data_size > > src/hotspot/share/code/relocInfo.cpp line 415: > >> 413: // We must check that the new offset can still fit in the instruction >> 414: // for architectures that have small branch ranges >> 415: #if defined(AARCH64) || defined(RISV) > > Should be `RISCV64` instead of `RISV`. This was removed in favor of `NativeCall::set_destination_mt_safe` > src/hotspot/share/code/relocInfo.cpp line 419: > >> 417: if (NativeCall::is_call_at(addr())) { >> 418: NativeCall* call = nativeCall_at(addr()); >> 419: address trampoline = call->get_trampoline(); > > We don't have this `call->get_trampoline()` method for RISC-V now. Trampoline call for RISC-V was deprecated by https://bugs.openjdk.org/browse/JDK-8332689 and later removed by https://bugs.openjdk.org/browse/JDK-8343430. > So I guess RISC-V is not affected here in this case? CC: @robehn Same as above. This was removed in favor of `NativeCall::set_destination_mt_safe` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2157708178 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2157708522 From duke at openjdk.org Thu Jun 19 22:27:43 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 22:27:43 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 10:07:58 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Change to ImmutableDataReferences > > src/hotspot/share/code/nmethod.cpp line 1572: > >> 1570: >> 1571: // Verify the nm we copied from is still valid >> 1572: if (method() != nullptr && method()->code() == this && !is_marked_for_deoptimization() && is_in_use()) { > > We can turn `method() != nullptr && method()->code() == this` into an assert. If `is_in_use()` returns true they should be true as well. I updated this based on your suggestion ([reference](https://github.com/chadrako/jdk/blob/e51a1a09bcee01149fcb4da3e5659e0cad1b2c8b/src/hotspot/share/code/nmethod.cpp#L1575-L1596)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2157721757 From dholmes at openjdk.org Thu Jun 19 22:39:35 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Jun 2025 22:39:35 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: <59wcJYQHvtQ3VI6fJWRjzxNirtxQuGgEACFVr2lJrYw=.1066c5fd-8c62-4bf3-ab0c-48b962994df0@github.com> On Wed, 18 Jun 2025 16:44:13 GMT, Cesar Soares Lucas wrote: >> Sorry for the confusion on my part. The lack of a PR that's consuming these changes makes it harder to know which parts are the important ones. > > @tkrodriguez , @dougxc - are you OK with the latest changes I pushed? @JohnTortugo the new test is failing in our CI - please see https://bugs.openjdk.org/browse/JDK-8360049 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2989277745 From duke at openjdk.org Thu Jun 19 22:44:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 22:44:39 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Tue, 3 Jun 2025 15:49:30 GMT, Tom Rodriguez wrote: >>> So this copying keeps the same compile_id, which sort of makes sense but it's also potentially confusing. What's the plan for how this interacts with flags like PrintNMethods and JVMTI code installation notification? This is done in nmethod::post_compiled_method which doesn't seem to be used on the new nmethod. If the reclamation of the old nmethod is performed in the normal fashion, we now have 2 nmethods alive with the same compile_id which could be confusing. But allocating a new compile_id breaks the connection to the original compile which seems bad too. >> >> As we are not compiling, `compile_id` should stay the same. Yes, we need to add some logging: `log_info(codecache)` and `PrintNMethods`. >> >> According to https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodLoad, compile methods can be moved. We need to generate events if it happens: >>>If it is moved, the [CompiledMethodUnload](https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodUnload) event is sent, followed by a new CompiledMethodLoad event. >> >> > we now have 2 nmethods alive with the same compile_id which could be confusing. >> >> If `compile_id` is interpreted as id of nmethod, it is confusing. Comment to `nmethod::_compile_id`: https://github.com/openjdk/jdk/blob/aea2837143289800cfbb7044de4f105e87e233ff/src/hotspot/share/code/nmethod.hpp#L259 >> >> According to it, it is id of a compilation task. In such case there should be no confusion. > >> If it is moved, the [CompiledMethodUnload](https://docs.oracle.com/en/java/javase/24/docs/specs/jvmti.html#CompiledMethodUnload) event is sent, followed by a new CompiledMethodLoad event. > >> we now have 2 nmethods alive with the same compile_id which could be confusing. > > It's nice that the JVMTI docs considered this problem but the notifications will be sent in the reverse order given our current implementation. We will create a new nmethod while the old nmethod might still be alive, at least for the purposes of deopt. Since this PR doesn't actually perform any relocation, I'm not sure what the plan is here. The most aggressive thing that could be done is to invalidate all frames which have the old nmethod on stack, but that still leaves the nmethod live for the purposes of deopt. It would probably be ok to synthesize an unload after the deopt since there should be no actual execution in those nmethods, but you will then have to suppress the one that's normally done during nmethod::unlink. > > I agree that the docs are fairly clear that all of this is ok, but that doesn't mean that assumptions haven't been made about the current implementation. We just need to make sure we do something rational and that it's possible to understand from our output what was done. @tkrodriguez > It's nice that the JVMTI docs considered this problem but the notifications will be sent in the reverse order given our current implementation. We will create a new nmethod while the old nmethod might still be alive, at least for the purposes of deopt. Since this PR doesn't actually perform any relocation, I'm not sure what the plan is here. I actually think it is better if we do not send unload events during relocation and allow the current code path (`nmethod::unlink()`) to send the unload event. Like you mentioned "relocation" isn?t actually moving the nmethod as it just copies and invalidates the original. According to the JVMTI spec: >> Note that a single method may have multiple compiled forms, and that this [CompiledMethodLoad] event will be sent for each form So it should be safe for us to do this. > The most aggressive thing that could be done is to invalidate all frames which have the old nmethod on stack, but that still leaves the nmethod live for the purposes of deopt. It would probably be ok to synthesize an unload after the deopt since there should be no actual execution in those nmethods, but you will then have to suppress the one that's normally done during nmethod::unlink. With the updated approach I don?t think this is necessary. Since the original nmethod is only being marked as not entrant it is still safe for it to be ?live?. This allows us to avoid special cases for relocated nmethods and it should behave similarly to as if it were recompiled. I also added a test to verify that we correctly publish the load and unload events during relocation ([source](https://github.com/openjdk/jdk/pull/23573/commits/b3358bda645c68f1ffccdfdcb98c44ee0ec69ce0)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2989283417 From kxu at openjdk.org Thu Jun 19 23:44:29 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 19 Jun 2025 23:44:29 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v6] In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 15:25:53 GMT, Kangcheng Xu wrote: >> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. >> >> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. >> >> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into counted-loop-refactor > > # Conflicts: > # src/hotspot/share/opto/loopnode.cpp > # src/hotspot/share/opto/loopnode.hpp > # src/hotspot/share/opto/loopopts.cpp > - Merge remote-tracking branch 'origin/master' into counted-loop-refactor > - further refactor is_counted_loop() by extracting functions > - WIP: refactor is_counted_loop() > - WIP: refactor is_counted_loop() > - WIP: review followups > - reviewer suggested changes > - line break > - remove TODOs > - Revert "improve formatting, naming, comments" > > This reverts commit fd6071761bdc47ab5695559dffd1e1dd6038d9f7. > - ... and 12 more: https://git.openjdk.org/jdk/compare/9d060574...fd93998b No worries! Enjoy your time off! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-2989397512 From duke at openjdk.org Thu Jun 19 23:56:36 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 19 Jun 2025 23:56:36 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v24] In-Reply-To: References: <8eMagllT-Sxnvp6tnIkYNyUe7PetzaHXqhiqHnAiApU=.3b4c422a-e9ad-492f-a82d-98ee16f053dd@github.com> Message-ID: On Tue, 17 Jun 2025 21:20:15 GMT, Dean Long wrote: >>> What do you mean by this? I don't see how recursive calls would behave differently. There is a check for intra-nmethod calls which I think would cover this case >> >> OK, I didn't see that check when I took a quick look. If it already works as intended, great. > >> We still need this check in the event that there is a direct call that no longer reaches. > > OK, I didn't realize that was what Relocation::pd_set_call_destination() was doing. I think it would be better for the CPU-specific code to take care of that, rather than the shared code. We already have functions like NativeCall::set_destination_mt_safe() that do the right thing regarding trampolines. I think this could be refactored into a commonm function that Relocation::pd_set_call_destination() could also use. Sorry for the churn, but hopefully we are converging on a solution. I thought I had done the refactoring for 8321509, but it looks like I went with the simply fix at the time of adding a parameter to set_destination_mt_safe() to make it lock-free. Thanks for the suggestion. I updated to use `set_destination_mt_safe()` instead ([reference](https://github.com/openjdk/jdk/pull/23573/commits/b02e8bdb63db8042418b92ade4a26647e4e2dd8b)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2157797060 From mhaessig at openjdk.org Fri Jun 20 01:36:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 20 Jun 2025 01:36:41 GMT Subject: Integrated: 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 14:07:29 GMT, Manuel H?ssig wrote: > Running `java -XX:NMethodSizeLimit=100 -version` triggers an assert because the lower limit of the debug flag `NMethodSizeLimit` is too low. > > `NMethodSizeLimit` corresponds more or less directly to the C1 code buffer size. It was added as a debug flag in 2005 to make it easier to stress the code paths related to the buffer size. Nowadays, it is not used for any stressing, but it has caused a bunch of bugs ([JDK-8316653](https://bugs.openjdk.org/browse/JDK-8316653), [JDK-8318817](https://bugs.openjdk.org/browse/JDK-8318817), [JDK-8320682](https://bugs.openjdk.org/browse/JDK-8320682)). Therefore, this PR removes the debug flag `NMethodSizeLimit` and converts it to a constant. > > Because this removes the code causing the error, this bug does not have a regression test (in fact, it removes regression tests). > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15735062162) > - [ ] tier1 through tier3 plus Oracle internal testing on all Oracle supported platforms This pull request has now been integrated. Changeset: a6464b74 Author: Manuel H?ssig Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/a6464b74a8c9b97653b292c18f5604d4d030a9cb Stats: 55 lines in 6 files changed: 0 ins; 48 del; 7 mod 8358578: Small -XX:NMethodSizeLimit triggers "not in CodeBuffer memory" assert in C1 Reviewed-by: kvn, syan, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25876 From qxing at openjdk.org Fri Jun 20 01:37:43 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 20 Jun 2025 01:37:43 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: <7VuodXKuUbzP7QJ50Qv28-BJaFcd2sQsvGSw8KUEuxQ=.426d570b-8153-452d-8efe-3d2489e87a70@github.com> On Thu, 22 May 2025 07:53:39 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Improve documentation comments Hi all, This patch has now passed all GHA tests and is ready for further reviews. If there are any other suggestions for this PR, please let me know. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2989561073 From fyang at openjdk.org Fri Jun 20 02:12:34 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 20 Jun 2025 02:12:34 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 08:41:41 GMT, Anjian Wen wrote: > Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv > > The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check Looks fine modulo one minor comment. Thanks. src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 1059: > 1057: // If no method data exists, go to profile_continue. > 1058: // Otherwise, assign to mdp > 1059: test_method_data_pointer(mdp, profile_continue); Nit: Can you remove the preceding code comment like the other platforms? `// Otherwise, assign to mdp` ------------- PR Review: https://git.openjdk.org/jdk/pull/25848#pullrequestreview-2944364120 PR Review Comment: https://git.openjdk.org/jdk/pull/25848#discussion_r2157903344 From wenanjian at openjdk.org Fri Jun 20 02:41:14 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 20 Jun 2025 02:41:14 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: <2uPGtbksG_vY7RMMfmRXjVTfl-5Y9TJGk9h-Y-y51lA=.751caf81-fe62-4deb-a9fe-a8c9b0752bed@github.com> > Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv > > The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: remove the preceding code comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25848/files - new: https://git.openjdk.org/jdk/pull/25848/files/85993dad..b07869e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25848&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25848&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25848/head:pull/25848 PR: https://git.openjdk.org/jdk/pull/25848 From wenanjian at openjdk.org Fri Jun 20 02:41:14 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 20 Jun 2025 02:41:14 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 02:09:10 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> remove the preceding code comment > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 1059: > >> 1057: // If no method data exists, go to profile_continue. >> 1058: // Otherwise, assign to mdp >> 1059: test_method_data_pointer(mdp, profile_continue); > > Nit: Can you remove the preceding code comment like the other platforms? > `// Otherwise, assign to mdp` Thanks for the review, I have removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25848#discussion_r2157930633 From fjiang at openjdk.org Fri Jun 20 03:00:45 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 20 Jun 2025 03:00:45 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: <2uPGtbksG_vY7RMMfmRXjVTfl-5Y9TJGk9h-Y-y51lA=.751caf81-fe62-4deb-a9fe-a8c9b0752bed@github.com> References: <2uPGtbksG_vY7RMMfmRXjVTfl-5Y9TJGk9h-Y-y51lA=.751caf81-fe62-4deb-a9fe-a8c9b0752bed@github.com> Message-ID: On Fri, 20 Jun 2025 02:41:14 GMT, Anjian Wen wrote: >> Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv >> >> The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > remove the preceding code comment Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25848#pullrequestreview-2944419495 From mbaesken at openjdk.org Fri Jun 20 05:20:28 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 20 Jun 2025 05:20:28 GMT Subject: RFR: 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 15:15:16 GMT, Manuel H?ssig wrote: > A run of `runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java` with an ubsan enabled binary revealed that passing the value 0 to `Tier(3|4)LoadFeedback`, and `TieredRateUpdateMinTime` lead to division by zero. > > Since `Tier(3|4)LoadFeedback` should disable the scaling of the compilation thresholds, 8bf37ee special cases the 0 case to disable scaling and documents it accordingly. > > 4893b28 sets the lower limit for `TieredRateUpdate(Min|Max)Time` to 1 since the code assumes that at least 1ms passes between each event: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/compiler/compilationPolicy.cpp#L968-L974 > > This PR was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760915006) > - [x] tier1 and 2 plus Oracle internal testing on Oracle supported platforms Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25902#pullrequestreview-2944590017 From thartmann at openjdk.org Fri Jun 20 05:38:08 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 05:38:08 GMT Subject: Integrated: 8360069: Problem list CodeInvalidationReasonTest.java until JDK-8360049 is fixed Message-ID: Problem listing with ZGC until [JDK-8360049](https://bugs.openjdk.org/browse/JDK-8360049) is fixed because it fails in our CI. Thanks, Tobias ------------- Commit messages: - 8360069: Problem list CodeInvalidationReasonTest.java until JDK-8360049 is fixed Changes: https://git.openjdk.org/jdk/pull/25908/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25908&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360069 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25908/head:pull/25908 PR: https://git.openjdk.org/jdk/pull/25908 From dholmes at openjdk.org Fri Jun 20 05:38:08 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Jun 2025 05:38:08 GMT Subject: Integrated: 8360069: Problem list CodeInvalidationReasonTest.java until JDK-8360049 is fixed In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 05:30:43 GMT, Tobias Hartmann wrote: > Problem listing with ZGC until [JDK-8360049](https://bugs.openjdk.org/browse/JDK-8360049) is fixed because it fails in our CI. > > Thanks, > Tobias LGTM! I'd missed the fact it only failed with ZGC. Thanks for the PL. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25908#pullrequestreview-2944611550 From thartmann at openjdk.org Fri Jun 20 05:38:08 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 05:38:08 GMT Subject: Integrated: 8360069: Problem list CodeInvalidationReasonTest.java until JDK-8360049 is fixed In-Reply-To: References: Message-ID: <5eam4GvZfJk6pHIaEIHkmzQhlH80rdVxxktxz-obGSM=.76916063-c00c-4f8a-ad79-4a5f45eeb0bf@github.com> On Fri, 20 Jun 2025 05:30:43 GMT, Tobias Hartmann wrote: > Problem listing with ZGC until [JDK-8360049](https://bugs.openjdk.org/browse/JDK-8360049) is fixed because it fails in our CI. > > Thanks, > Tobias Thanks for the quick review David! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25908#issuecomment-2989848079 From thartmann at openjdk.org Fri Jun 20 05:38:08 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 05:38:08 GMT Subject: Integrated: 8360069: Problem list CodeInvalidationReasonTest.java until JDK-8360049 is fixed In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 05:30:43 GMT, Tobias Hartmann wrote: > Problem listing with ZGC until [JDK-8360049](https://bugs.openjdk.org/browse/JDK-8360049) is fixed because it fails in our CI. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 33970629 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/33970629ac63eea6009fca7a34c8f333f1a60a37 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8360069: Problem list CodeInvalidationReasonTest.java until JDK-8360049 is fixed Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/25908 From thartmann at openjdk.org Fri Jun 20 06:06:07 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 06:06:07 GMT Subject: [jdk25] RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used Message-ID: Hi all, This pull request contains a backport of commit [b52af182](https://github.com/openjdk/jdk/commit/b52af182c43380186decd7e35625e42c7cafb8c2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Srinivas Vamsi Parasa on 18 Jun 2025 and was reviewed by Tobias Hartmann, Aleksey Shipilev, Jatin Bhateja and Sandhya Viswanathan. Thanks! ------------- Commit messages: - Backport b52af182c43380186decd7e35625e42c7cafb8c2 Changes: https://git.openjdk.org/jdk/pull/25909/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25909&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359386 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25909.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25909/head:pull/25909 PR: https://git.openjdk.org/jdk/pull/25909 From mhaessig at openjdk.org Fri Jun 20 06:34:28 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 20 Jun 2025 06:34:28 GMT Subject: [jdk25] RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 06:01:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b52af182](https://github.com/openjdk/jdk/commit/b52af182c43380186decd7e35625e42c7cafb8c2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Srinivas Vamsi Parasa on 18 Jun 2025 and was reviewed by Tobias Hartmann, Aleksey Shipilev, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! Looks good to me! ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25909#pullrequestreview-2944716046 From mchevalier at openjdk.org Fri Jun 20 07:40:28 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 07:40:28 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Wed, 18 Jun 2025 11:42:05 GMT, Emanuel Peter wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > src/hotspot/share/opto/callnode.cpp line 1335: > >> 1333: if (can_reshape && is_unused()) { >> 1334: return make_tuple_of_input_state_and_top_return_values(phase->C); >> 1335: } > > Can you add a code comment what this does? It's better commented at the definition of `make_tuple_of_input_state_and_top_return_values` now. Let's not bloat the each call site with the same wall of text again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158257045 From mchevalier at openjdk.org Fri Jun 20 07:48:27 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 07:48:27 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Wed, 18 Jun 2025 14:38:07 GMT, Emanuel Peter wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > src/hotspot/share/opto/callnode.cpp line 1306: > >> 1304: >> 1305: //============================================================================= >> 1306: bool CallLeafPureNode::is_unused() const { > > Can you add a quick comment why this check implies that the node is not used, i.e. what that means? I think i'll need you to explain to me what is unclear at the moment. When I read the function, I see: "A CallLeafPure is unused iff there is no output result projection." I don't see what else to add that is not covered by "if we don't use the result, the pure call is unused", which is exactly the code. Is there any untold hypothesis lurking somewhere that I don't see? It seems to me it uses just very common concepts of C2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158269317 From mchevalier at openjdk.org Fri Jun 20 07:53:30 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 07:53:30 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Wed, 18 Jun 2025 14:40:07 GMT, Emanuel Peter wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > src/hotspot/share/opto/divnode.cpp line 1642: > >> 1640: } >> 1641: assert(projs.catchall_ioproj == nullptr, "no exceptions from floating mod"); >> 1642: assert(projs.catchall_catchproj == nullptr, "no exceptions from floating mod"); > > Why were you able to remove this? Pure functions have only control input, data inputs (1 or 2 in practice, so far), control output and data output: they can't alter the memory, they can't throw etc since they are pure. All these output were already unnecessary (which I noticed when working on removing modulo a few months ago), and are now simply not present, so the rewiring is useless. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158278198 From mchevalier at openjdk.org Fri Jun 20 07:57:34 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 07:57:34 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: <5ic5YfZ-YLE0Xl0RaB2ON64eY8JE8uZ0Qqfach6DHII=.d0d86f8c-7aa7-435d-866a-f7b1c8efbbd0@github.com> On Wed, 18 Jun 2025 14:45:14 GMT, Emanuel Peter wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > src/hotspot/share/opto/multnode.cpp line 177: > >> 175: } >> 176: return this; >> 177: } > > What would happen if we miss to do this optimization? I suppose we would have a Tuple left in the graph and get a bad AD file assert / deopt in product? Exactly. Bad AD file. I don't think it's possible at the moment because when we return a tuple node during IGVN instead of a call, the user (all projections) are added to the worklist and they will all skip the tuple. As used for now, a Tuple should appear and disappear in the same IGVN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158284547 From thartmann at openjdk.org Fri Jun 20 08:07:28 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 08:07:28 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v4] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 11:36:27 GMT, Fei Yang wrote: >> Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. >> >> There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. >> >> And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. >> >> Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. >> >> Testing: >> - [x] Tier1-3 on linux-x64 (release & fastdebug) >> - [x] Tier1-3 on linux-aarch64 (release & fastdebug) >> - [x] Tier1-3 on linux-riscv64 (release) >> - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Testing passed. This looks good to me but needs a second review. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25765#pullrequestreview-2944943014 From epeter at openjdk.org Fri Jun 20 08:14:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 08:14:28 GMT Subject: [jdk25] RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 06:01:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b52af182](https://github.com/openjdk/jdk/commit/b52af182c43380186decd7e35625e42c7cafb8c2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Srinivas Vamsi Parasa on 18 Jun 2025 and was reviewed by Tobias Hartmann, Aleksey Shipilev, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25909#pullrequestreview-2944963081 From mchevalier at openjdk.org Fri Jun 20 08:16:11 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 08:16:11 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: Message-ID: > A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. > > This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Mostly comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25760/files - new: https://git.openjdk.org/jdk/pull/25760/files/ff0a6b6a..34fd5e9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25760&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25760&range=00-01 Stats: 43 lines in 6 files changed: 36 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25760/head:pull/25760 PR: https://git.openjdk.org/jdk/pull/25760 From mchevalier at openjdk.org Fri Jun 20 08:16:11 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 08:16:11 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: <2BVVfe4S925G10VAJBu7KV_Y87aPxByXyfl5AxVJU8o=.5df2a49c-d9dd-4010-b0bc-283698e9e1ec@github.com> On Wed, 18 Jun 2025 14:30:53 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Mostly comments > > src/hotspot/share/opto/compile.cpp line 3313: > >> 3311: for (unsigned int i = 0; i < call->tf()->domain()->cnt() - TypeFunc::Parms; i++) { >> 3312: new_call->init_req(TypeFunc::Parms + i, call->in(TypeFunc::Parms + i)); >> 3313: } > > The `TypeFunc::Parms` offsets are a bit confusing / unnecessary. Why not: > Suggestion: > > // Copy all . > for (unsigned int i = TypeFunc::Parms; i < call->tf()->domain()->cnt(); i++) { > new_call->init_req(i, call->in(i)); > } Fine with me! Looks perfectly identical to me. Comment seems superfluous to me: the code is very simple, let's not distract the reader. > src/hotspot/share/opto/divnode.cpp line 1530: > >> 1528: >> 1529: if (is_unused()) { >> 1530: return make_tuple_of_input_state_and_top_return_values(igvn->C); > > What does this do? Add a short comment :) Same as above: better commented on definition than bloating every single callsite of a function. If I wonder, I go see the definition, if I don't, I don't have useless stuff on my screen. > src/hotspot/share/opto/multnode.hpp line 113: > >> 111: const TypeTuple* _tf; >> 112: >> 113: template > > Does this need to be a template? Or would a type like `Node*` or `Node` suffice? The point is that it's a variadic template. Of course, ideally, I'd like it to be a pack of `Node*` but there isn't a simple way to write that. The upside is that one can write `TupleNode::make(some_type, input1, input2, input3, input4)` with how many input you want in a single construction. I prefer hiding the not nice here, to have compact and readable usages of TupleNode, rather than having every usage look like TupleNode* tuple = new TupleNode(some_type); tuple.set_req(0, input1); tuple.set_req(1, input2); tuple.set_req(2, input3); tuple.set_req(3, input4); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158312752 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158313274 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158310606 From mhaessig at openjdk.org Fri Jun 20 08:22:03 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 20 Jun 2025 08:22:03 GMT Subject: RFR: 8355276: Sort C2 includes Message-ID: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. Testing: - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) - [ ] tier1,tier2 plus Oracle internal testing on Oracle supported platforms ------------- Commit messages: - Enforce sorted includes in hotspot/share/opto - Sort includes in C2 sources Changes: https://git.openjdk.org/jdk/pull/25910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25910&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355276 Stats: 119 lines in 40 files changed: 57 ins; 55 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25910/head:pull/25910 PR: https://git.openjdk.org/jdk/pull/25910 From aph at openjdk.org Fri Jun 20 08:27:27 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 20 Jun 2025 08:27:27 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: <8Ssuhff6me6Y7uzpJ1bHEAHrBVd-HQuI5sjSwIQQZNg=.90666142-d2b4-4ba0-8ad5-9e38a1fcd3ee@github.com> On Thu, 19 Jun 2025 21:17:30 GMT, Chad Rakoczy wrote: > [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) > > The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) > > The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove > > Additional Testing: > - [x] Linux aarch64 fastdebug tier 1 > - [x] Linux aarch64 fastdebug tier 2 > - [x] Linux aarch64 fastdebug tier 3 > - [x] Linux aarch64 fastdebug tier 4 Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25906#pullrequestreview-2944997725 From thartmann at openjdk.org Fri Jun 20 08:31:35 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 08:31:35 GMT Subject: [jdk25] RFR: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 06:01:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b52af182](https://github.com/openjdk/jdk/commit/b52af182c43380186decd7e35625e42c7cafb8c2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Srinivas Vamsi Parasa on 18 Jun 2025 and was reviewed by Tobias Hartmann, Aleksey Shipilev, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! Thanks Manuel and Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25909#issuecomment-2990257888 From thartmann at openjdk.org Fri Jun 20 08:31:36 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Jun 2025 08:31:36 GMT Subject: [jdk25] Integrated: 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 06:01:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b52af182](https://github.com/openjdk/jdk/commit/b52af182c43380186decd7e35625e42c7cafb8c2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Srinivas Vamsi Parasa on 18 Jun 2025 and was reviewed by Tobias Hartmann, Aleksey Shipilev, Jatin Bhateja and Sandhya Viswanathan. > > Thanks! This pull request has now been integrated. Changeset: 3f6b0c69 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/3f6b0c69c3f49d28e76f0f9f0286988f1830c49a Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8359386: Fix incorrect value for max_size of C2CodeStub when APX is used Reviewed-by: mhaessig, epeter Backport-of: b52af182c43380186decd7e35625e42c7cafb8c2 ------------- PR: https://git.openjdk.org/jdk/pull/25909 From dnsimon at openjdk.org Fri Jun 20 08:37:40 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 20 Jun 2025 08:37:40 GMT Subject: RFR: 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 Message-ID: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> [JDK-8359064](https://bugs.openjdk.org/browse/JDK-8359064) introduced a new test (CodeInvalidationReasonTest) which triggers a code path in TestHotSpotVMConfig that had apparently never been run on ZGC+AArch64. This PR fixes an omission in that code to handle this configuration. ------------- Commit messages: - support NMethodPatchingType for ZGC on AArch64 Changes: https://git.openjdk.org/jdk/pull/25911/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25911&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360049 Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25911.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25911/head:pull/25911 PR: https://git.openjdk.org/jdk/pull/25911 From aph at openjdk.org Fri Jun 20 08:41:27 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 20 Jun 2025 08:41:27 GMT Subject: RFR: 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 In-Reply-To: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> References: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> Message-ID: On Fri, 20 Jun 2025 08:32:45 GMT, Doug Simon wrote: > [JDK-8359064](https://bugs.openjdk.org/browse/JDK-8359064) introduced a new test (CodeInvalidationReasonTest) which triggers a code path in TestHotSpotVMConfig that had apparently never been run on ZGC+AArch64. > This PR fixes an omission in that code to handle this configuration. Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25911#pullrequestreview-2945047810 From epeter at openjdk.org Fri Jun 20 08:46:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 08:46:34 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter [v4] In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Wed, 18 Jun 2025 07:46:21 GMT, Xiaohong Gong wrote: >> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. >> This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer >> to the detailed discussion for a related performance issue from [1]. >> >> The ideal graph of such a loop typically looks like: >> >> >> /-----------| >> | | >> | ConI | >> loop | / / >> | | / / >> \ AddI / >> RangeCheck \ / | >> | \ / | >> IfTrue Phi | >> \ | | >> RangeCheck \ | | >> \ CastII / <- Range check #1 >> | | / >> IfTrue | | >> \ | | >> CastII | <- Range check #2 >> | / >> |-------/ >> >> >> >> For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used >> by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. >> >> This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. >> >> Test: >> - Tested tier1, tier2, tier3, and no regressions are found. >> - An additional test case is added to verify the fix. >> >> Performance: >> Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: >> >> >> Benchmark Mode Cnt Unit Before After Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 >> CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 >> >> >> We can also observe the similar uplift on a x86_64 machine. >> >> [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'jdk:master' into JDK-8357726 > > Change-Id: I0c10a563a3873b2220ce4d4c9b999c52159f578f > - Address reivew comments on IR test > - Address review comments on jtreg and jmh tests > - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times LGTM, thanks for the work you put in :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25539#pullrequestreview-2945061713 From shade at openjdk.org Fri Jun 20 08:48:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Jun 2025 08:48:30 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 21:17:30 GMT, Chad Rakoczy wrote: > [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) > > The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) > > The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove > > Additional Testing: > - [x] Linux aarch64 fastdebug tier 1 > - [x] Linux aarch64 fastdebug tier 2 > - [x] Linux aarch64 fastdebug tier 3 > - [x] Linux aarch64 fastdebug tier 4 Looks fine. I also think `r1` saves are now unnecessary, right, @theRealAph? // ECN: FIXME: This code smells // check if MethodCounters exists Label has_counters; __ ldr(rscratch1, Address(rmethod, Method::method_counters_offset())); __ cbnz(rscratch1, has_counters); __ push(r0); __ push(r1); // <--- here __ push(r2); __ call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::build_method_counters), rmethod); __ pop(r2); __ pop(r1); // <--- here __ pop(r0); ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25906#pullrequestreview-2945067009 From shade at openjdk.org Fri Jun 20 08:54:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Jun 2025 08:54:31 GMT Subject: RFR: 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 In-Reply-To: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> References: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> Message-ID: On Fri, 20 Jun 2025 08:32:45 GMT, Doug Simon wrote: > [JDK-8359064](https://bugs.openjdk.org/browse/JDK-8359064) introduced a new test (CodeInvalidationReasonTest) which triggers a code path in TestHotSpotVMConfig that had apparently never been run on ZGC+AArch64. > This PR fixes an omission in that code to handle this configuration. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25911#pullrequestreview-2945084944 From dnsimon at openjdk.org Fri Jun 20 08:58:37 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 20 Jun 2025 08:58:37 GMT Subject: RFR: 8359064: Expose reason for marking nmethod non-entrant to JVMCI client [v6] In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 16:44:13 GMT, Cesar Soares Lucas wrote: >> Sorry for the confusion on my part. The lack of a PR that's consuming these changes makes it harder to know which parts are the important ones. > > @tkrodriguez , @dougxc - are you OK with the latest changes I pushed? > @JohnTortugo the new test is failing in our CI - please see https://bugs.openjdk.org/browse/JDK-8360049 It is being fixed by https://github.com/openjdk/jdk/pull/25911. This is my fault for not having run the internal CI on this change before approving. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25706#issuecomment-2990336864 From shade at openjdk.org Fri Jun 20 09:02:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Jun 2025 09:02:35 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [ ] tier1,tier2 plus Oracle internal testing on Oracle supported platforms Oh, great, Windows GHA is apparently failing out with MSVC `14.43` and new runner image. I guess there is no choice but to switch to `14.44` now :) https://github.com/openjdk/jdk/pull/25912. Wait until GHA is stable again, so to check that MSVC accepts the include order correctly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25910#issuecomment-2990352921 From aph at openjdk.org Fri Jun 20 09:05:27 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 20 Jun 2025 09:05:27 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch In-Reply-To: <8Ssuhff6me6Y7uzpJ1bHEAHrBVd-HQuI5sjSwIQQZNg=.90666142-d2b4-4ba0-8ad5-9e38a1fcd3ee@github.com> References: <8Ssuhff6me6Y7uzpJ1bHEAHrBVd-HQuI5sjSwIQQZNg=.90666142-d2b4-4ba0-8ad5-9e38a1fcd3ee@github.com> Message-ID: <4b3BVjc98m_NEJqX8pZZ71yVTWuWhhG0FP3x5Wm4Ncs=.70715f1c-7dcb-4c73-8e4c-7d2cae40b39e@github.com> On Fri, 20 Jun 2025 08:24:40 GMT, Andrew Haley wrote: >> [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) >> >> The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) >> >> The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove >> >> Additional Testing: >> - [x] Linux aarch64 fastdebug tier 1 >> - [x] Linux aarch64 fastdebug tier 2 >> - [x] Linux aarch64 fastdebug tier 3 >> - [x] Linux aarch64 fastdebug tier 4 > > Marked as reviewed by aph (Reviewer). > Looks fine. I also think `r1` saves are now unnecessary, right, @theRealAph? > > ``` > // ECN: FIXME: This code smells > // check if MethodCounters exists > Label has_counters; > __ ldr(rscratch1, Address(rmethod, Method::method_counters_offset())); > __ cbnz(rscratch1, has_counters); > __ push(r0); > __ push(r1); // <--- here > __ push(r2); > __ call_VM(noreg, CAST_FROM_FN_PTR(address, > InterpreterRuntime::build_method_counters), rmethod); > __ pop(r2); > __ pop(r1); // <--- here > __ pop(r0); > ``` I thinks so, but at a first glance `r1` seems to be clobbered on all code paths immediately after this, so I don't think it's needed anyway, ------------- PR Comment: https://git.openjdk.org/jdk/pull/25906#issuecomment-2990366304 From mhaessig at openjdk.org Fri Jun 20 09:24:27 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 20 Jun 2025 09:24:27 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: <6H_d8i7V8XogoQzHuA-8V_K2loXUR7YRfYLkNu2jsuA=.0c2f7df8-ff01-4ac8-9137-e4fd025400a9@github.com> On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms On our internal testing, Windows built and tested fine. I just kicked GHA to rerun the failed Windows builds, which fixed the problem for other runs yesterday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25910#issuecomment-2990444473 From epeter at openjdk.org Fri Jun 20 09:41:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 09:41:29 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v8] In-Reply-To: References: Message-ID: <206vnNQbOnru8WdbW7bzI1yWvCr7Vl9EcbxyIAlUiVM=.b89a5392-517f-46f8-9552-472abbfa74a7@github.com> On Wed, 18 Jun 2025 13:11:46 GMT, Roland Westrelin wrote: >> `test1()` has a counted loop with a `Store` to `field`. That `Store` >> is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only >> `Phi`s that exist at the inner loop are added to the outer >> loop. There's no `Phi` for the slice of the sunk `Store` (because >> there's no `Store` left in the inner loop) so no `Phi` is added for >> that slice to the outer loop. As a result, there's a missing anti >> dependency for `Load` of `field` that's before the loop and it can be >> scheduled inside the outer strip mined loop which is incorrect. >> >> `test2()` is the same as `test1()` but with a chain of 2 `Store`s. >> >> `test3()` is another variant where a `Store` is left in the inner loop >> after one is sunk out of it so the inner loop still has a `Phi`. As a >> result, the outer loop also gets a `Phi` but it's incorrectly wired as >> the sunk `Store` should be the input along the backedge but is >> not. That one doesn't cause any failure AFAICT. >> >> The fix I propose is some extra logic at expansion of the >> `OuterStripMinedLoop` to handle these corner cases. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review @rwestrel The patch looks good, thanks for the work @rwestrel ! That said, and as mentioned above: we should probably investigate if we can add the Phi's from the beginning, so that we do not violate the C2 IR assumptions. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25717#pullrequestreview-2945290515 From epeter at openjdk.org Fri Jun 20 09:41:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 09:41:30 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: References: <5Ux_6rUt9dtmICBrFavCIGGQp8shno1zYG7mbVtO5zs=.de8b7a55-a452-42df-a9e7-553addfb51a7@github.com> Message-ID: On Thu, 19 Jun 2025 09:59:49 GMT, Roberto Casta?eda Lozano wrote: > > ``` > > * Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later? > > ``` > > Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild. I'm ok with that too. The fix looks reasonable, and not too risky. It is probably better to have this fix in than keeping the bug in the wild, as you say. > > ``` > > * Is this really a regression from [JDK-8281322](https://bugs.openjdk.org/browse/JDK-8281322)? If not, the affects version in JBS should be updated such that we'll consider this for backporting. > > ``` > > I don't think it is. It's an issue that exists since loop strip mining exists AFAICT. I haven't tried how far back the test reproduces it though. It would be good if you could find out what versions are really affected, and set the affected numbers accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2990513890 From roland at openjdk.org Fri Jun 20 09:59:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 20 Jun 2025 09:59:28 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v8] In-Reply-To: <206vnNQbOnru8WdbW7bzI1yWvCr7Vl9EcbxyIAlUiVM=.b89a5392-517f-46f8-9552-472abbfa74a7@github.com> References: <206vnNQbOnru8WdbW7bzI1yWvCr7Vl9EcbxyIAlUiVM=.b89a5392-517f-46f8-9552-472abbfa74a7@github.com> Message-ID: On Fri, 20 Jun 2025 09:38:37 GMT, Emanuel Peter wrote: > That said, and as mentioned above: we should probably investigate if we can add the Phi's from the beginning, so that we do not violate the C2 IR assumptions. I filed: https://bugs.openjdk.org/browse/JDK-8360096 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2990607638 From roland at openjdk.org Fri Jun 20 09:59:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 20 Jun 2025 09:59:29 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: References: <5Ux_6rUt9dtmICBrFavCIGGQp8shno1zYG7mbVtO5zs=.de8b7a55-a452-42df-a9e7-553addfb51a7@github.com> Message-ID: On Fri, 20 Jun 2025 09:37:10 GMT, Emanuel Peter wrote: > It would be good if you could find out what versions are really affected, and set the affected numbers accordingly. I conservatively added every version since loop strip mining was integrated given they are affected even if this particular test case doesn't fail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2990615621 From epeter at openjdk.org Fri Jun 20 10:02:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:02:41 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: <_oBt35lcOuEjkT2Q1z_gDxG739kPVKJhXztAJAn3Ysw=.87a04c5b-5a1f-4042-a0cd-1447f31827ce@github.com> References: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> <_oBt35lcOuEjkT2Q1z_gDxG739kPVKJhXztAJAn3Ysw=.87a04c5b-5a1f-4042-a0cd-1447f31827ce@github.com> Message-ID: On Fri, 25 Apr 2025 17:58:10 GMT, Daniel Lund?n wrote: >> Or why do we need to set all these flags? If we need some of them, then a comment could be helpful. > > Note that I only added the `-XX:MaxNodeLimit=20000`, all the other flags are from before (I just added line breaks). It could very well make sense to have a run with fewer flags, but I'm not sure if that's compatible with what the original author intended. I'd prefer leaving it as it is. Is it the use of the JSR292 methods that increases the limit? Or do you need to increase the limit to make sure we don't hit the limit? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158546633 From epeter at openjdk.org Fri Jun 20 10:08:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:08:41 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 17:58:08 GMT, Daniel Lund?n wrote: >> test/jdk/java/lang/invoke/TestCatchExceptionWithVarargs.java line 32: >> >>> 30: * timeouts due to compilation of a large number of methods with a >>> 31: * large number of parameters. >>> 32: * @run main/othervm -XX:MaxNodeLimit=15000 TestCatchExceptionWithVarargs >> >> Why not have two runs here. One that requires that there is no `Xcomp`, where we have the normal node limit. And one where you lower the node limit, so that `Xcomp` is ok? > > Same here, I just added the `MaxNodeLimit`. I'd prefer leaving any other changes to a separate RFE (if needed). I still don't quite understand the comment. Maybe it is just the phrasing that I'm struggling with. Did you mean this: We would get timeouts in this test, especially with -Xcomp. It is because of compilations with a large number of methods with a large number of parameters, and that means larger C2 graphs, which take longer. Rather than increasing the test timeout, we just lower the MaxNodeLimit, so that those compilations bail out fast, and do not take up so much compile time. But then: why not just increase the test timeout? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158557020 From epeter at openjdk.org Fri Jun 20 10:11:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:11:41 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 17:58:03 GMT, Daniel Lund?n wrote: >> test/jdk/java/lang/invoke/VarargsArrayTest.java line 46: >> >>> 44: * -DVarargsArrayTest.MAX_ARITY=255 >>> 45: * -DVarargsArrayTest.START_ARITY=250 >>> 46: * VarargsArrayTest >> >> Same here. >> >> But I mean are those timeouts ok with Xcomp? Are we sure that these timeouts are only a test issue and not a product issue? > > There is a thread in the PR about this already (difficult to find!), so I'm pasting it below for convenience. > >> @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2023195229) > Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? >> >> @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2023204041) > Same question for the other java/lang/invoke test changes. >> >> @[dlunde](https://github.com/dlunde) dlunde [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2028661113) > Yes, correct. No longer bailing out on too many arguments results in a lot more compilations (with -Xcomp) compared to before in these specific tests, which is why I've had to limit the tests with MaxNodeLimits. >> >> That said, I did look into these tests a bit more now after your comment, and there are some peculiar (but artificial) compilations that we no longer bail out on and that we may want to investigate in a future RFE. These compilations each take around 40 seconds (in a release build), are very close to the MaxNodeLimit (80 000 nodes), and spend 99% of the time in the register allocator (in the first round of conservative coalescing, specifically). I analyzed these register allocator runs and it looks like we run into the quadratic time complexity of graph-coloring register allocation, because we have a very large number of nodes to begin with and then the interference graph is additionally very dense (contains a very large number of interferences/edges). We already have bailouts related to node count in the register allocator, but no bailouts for the interference graph size. Perhaps we should consider adding this as part of a separate RFE. >> >> @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2028671476) >> > Perhaps we should consider adding this as part of a separate RFE. >> >> This sounds like a good idea, I agree to postpone it to a separate RFE. Oh I see, you have some comments about why you modify `MaxNodeLimit` in these tests here. It would be nice if you mentioned this information in the PR description! To me, this looks like a possible cause for (compile time) regressions. Imagine, someone has such a method in the startup/warmup of their application. And now this change all of the sudden delays compilation by 40seconds. That would be quite bad! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158563316 From epeter at openjdk.org Fri Jun 20 10:16:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:16:43 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v20] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 10:08:36 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates > - Fix typo > - Updates after Emanuel's comments > - Refactor and improve TestNestedSynchronize.java > - Update comments > - Revise TestNestedSynchronize to make use of CompileFramework > - Revise overlap comments for frequency of cases > - Update test comment to also mention timeouts > - Fix suboptimal max limit in _grow > - Updates after comments > - ... and 19 more: https://git.openjdk.org/jdk/compare/ed4cd2ac...9cefa15f > There is a thread in the PR about this already (difficult to find!), so I'm pasting it below for convenience. > > > @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2023195229) > > Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? > > @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2023204041) > > Same question for the other java/lang/invoke test changes. > > @[dlunde](https://github.com/dlunde) dlunde [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2028661113) > > Yes, correct. No longer bailing out on too many arguments results in a lot more compilations (with -Xcomp) compared to before in these specific tests, which is why I've had to limit the tests with MaxNodeLimits. > > That said, I did look into these tests a bit more now after your comment, and there are some peculiar (but artificial) compilations that we no longer bail out on and that we may want to investigate in a future RFE. These compilations each take around 40 seconds (in a release build), are very close to the MaxNodeLimit (80 000 nodes), and spend 99% of the time in the register allocator (in the first round of conservative coalescing, specifically). I analyzed these register allocator runs and it looks like we run into the quadratic time complexity of graph-coloring register allocation, because we have a very large number of nodes to begin with and then the interference graph is additionally very dense (contains a very large number of interferences/edges). We already have bailouts related to node count in the register allocator, but no bailouts for the interference graph size. Perhaps we should consider adding this as part of a separate RFE. > > @[robcasloz](https://github.com/robcasloz) robcasloz [3 weeks ago](https://github.com/openjdk/jdk/pull/20404#discussion_r2028671476) > > > Perhaps we should consider adding this as part of a separate RFE. > > > > > > This sounds like a good idea, I agree to postpone it to a separate RFE. @dlunde I see, thanks for bringing this back up. I already commented above, but want to highlight this here too. It would be nice if you mentioned this information in the PR description! To me, this looks like a possible cause for (compile time) regressions. Imagine, someone has such a method in the startup/warmup of their application. And now this change all of the sudden delays compilation by 40seconds. That would be quite bad! Hence, I wonder if we should not already investigate this now, so we are a bit more sure we do not see 40sec compilations in the wild. Is @vnkozlov aware of this issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2990729544 From epeter at openjdk.org Fri Jun 20 10:16:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:16:44 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 18:00:35 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/compile.hpp line 524: >> >>> 522: PhaseRegAlloc* _regalloc; // Results of register allocation. >>> 523: RegMask _FIRST_STACK_mask; // All stack slots usable for spills (depends on frame layout) >>> 524: ResourceArea _regmask_arena; // Holds dynamically allocated extensions of short-lived register masks >> >> If they are short-lived ... then why not just resource allocate them? Is there a conflict? Would it be good to describe that somewhere? > > I assume you mean resource allocate in the default `Thread::current()->resource_area()`? The existing resource marks are often far away, leading to unnecessary memory consumption. Trying to narrow the existing resource marks does lead to conflicts. I added a motivation to the description in `compile.hpp`! And these conflicts cannot be resolved? Can you bring an example that is too much effort? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158574971 From epeter at openjdk.org Fri Jun 20 10:26:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:26:44 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 18:07:54 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/regmask.hpp line 429: >> >>> 427: } >>> 428: >>> 429: RegMask(const RegMask& rm) : RegMask(rm, nullptr) {} >> >> This is the copy constructor, right? Can you add a comment what kind of implementation you chose here, and why? > > Yes, it is the copy constructor. Can you elaborate a bit on what type of comment you expect? There are already some comments in the `copy` method. I think I was wondering if it was shallow or deep copying, but it has been a while since I reviewed. >> src/hotspot/share/opto/regmask.hpp line 452: >> >>> 450: assert(valid_watermarks(), "sanity"); >>> 451: for (unsigned i = _lwm; i <= _hwm; i++) { >>> 452: if (_rm_up(i)) { >> >> Is this an implicit `nullptr` check? If so, you need to make it explicit, the style guide forbids implicit null checks. > > This is a bit tricky. It is indeed a pointer type (`uintptr_t`), but it is not used as a pointer. It is used to store the register mask bits. So I guess this is then not an implicit null check? It is an implicit zero check :slightly_smiling_face: > Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. Ok, well no matter what, it would not conform to the style. Maybe you already fixed it though, not sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158596817 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158606438 From epeter at openjdk.org Fri Jun 20 10:31:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:31:39 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:56:53 GMT, Daniel Lund?n wrote: >> @dlunde Again: thanks for working on this! It looks like a lot of work, and the existing code was not exactly in the best stlye already ? So don't get discouraged by my many comments, a lot of them are small things anyway, and many are just nits. > > @eme64 Thanks for the comments, I'll start addressing them soon! I'm certainly not discouraged (rather the opposite), keep the comments coming :slightly_smiling_face: @dlunde I responded to a few more issues above at my previous comments. I have not yet looked at the code itself again. I can do that once we have discussed the current topics :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2990843396 From epeter at openjdk.org Fri Jun 20 10:31:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:31:39 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: <4JpU1sh_7wBfEZG3sJ8z-dWz-Wpk7osUjYZByvqetgc=.acf93145-620b-42e0-af57-ddf20875fd96@github.com> On Fri, 25 Apr 2025 18:22:47 GMT, Daniel Lund?n wrote: >> The name divergence between `basic_rm_size` and `_RM_SIZE` generally makes me a little suspicious if we chose the names right? Why do we even need the `RM` / `rm` prefix everyhwere? Is that not already clear from the context, we are after all in a register mask ? Not sure if it is worth changing everything now, or ever. But we should at least look for consistency ;) > > Yes, it is confusing but consistent. Your intuition is correct: there is a difference between `_rm_size` (the current total size, including extension) and `_RM_SIZE` (the base static size) ?. @robcasloz introduced the "basic" terminology when working on tests in `test_regmask.cpp` and needed some way to expose `_RM_SIZE` publically in non-product code. Therefore, we have the method `basic_rm_size`. I don't really have a better suggestion. Perhaps `base_rm_size`, or `static_rm_size`? As in "the base/static part of rm_size". We cannot call the method `_RM_SIZE()` as that is prohibited by the style guide. We cannot call the method `RM_SIZE()` as `RM_SIZE` is a macro (and also not the same thing as `_RM_SIZE` on 64-bit machines). > >> Why do we even need the RM / rm prefix everyhwere? > > We really don't, but that's how it is :slightly_smiling_face: Could be worth refactoring, but not in this changeset! Alright. Well sure, we don't have to do a full renaming now. Though I do need to understand what is what to be able to review. Is there a good definition somewhere of what is what? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2158617210 From epeter at openjdk.org Fri Jun 20 10:41:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 10:41:30 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: <2rsMoMA6FghVPQ1vWzv_lotSijqa7pH90lBJ2aEiXZ8=.ddb4c4ab-e5d2-4d64-9250-f0a63920caa0@github.com> On Wed, 18 Jun 2025 04:03:59 GMT, Jasmine Karthikeyan wrote: >> And just for good measure: should we also add tests for `char`? > > @eme64 I've updated the patch to address the comments, let me know what you think! > > @mur47x111 Thanks for the comment! I've merged from master. @jaskarth The code now looks great :) I'll run some testing over the weekend! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2990893740 From fyang at openjdk.org Fri Jun 20 11:02:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 20 Jun 2025 11:02:33 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v4] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 08:04:27 GMT, Tobias Hartmann wrote: > Testing passed. This looks good to me but needs a second review. Thanks for the review and suggestions. Sure, I'll wait for another review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2990990447 From fyang at openjdk.org Fri Jun 20 11:05:29 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 20 Jun 2025 11:05:29 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: <2uPGtbksG_vY7RMMfmRXjVTfl-5Y9TJGk9h-Y-y51lA=.751caf81-fe62-4deb-a9fe-a8c9b0752bed@github.com> References: <2uPGtbksG_vY7RMMfmRXjVTfl-5Y9TJGk9h-Y-y51lA=.751caf81-fe62-4deb-a9fe-a8c9b0752bed@github.com> Message-ID: On Fri, 20 Jun 2025 02:41:14 GMT, Anjian Wen wrote: >> Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv >> >> The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > remove the preceding code comment Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25848#pullrequestreview-2945643005 From jbhateja at openjdk.org Fri Jun 20 11:21:30 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 20 Jun 2025 11:21:30 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Thu, 19 Jun 2025 06:19:49 GMT, Emanuel Peter wrote: >>> @jatin-bhateja The patch now looks good to me, nice work! ? I'll run some internal testing. >>> >>> However: I do not have hardware to thest Float16 on. So I'll rely on you to do thorough testing on relevant hardware, or alternatively SDE. >> >> Hi @eme64 , Tests on Float16 targets are clean, let us know the results of internal testing. > > @jatin-bhateja > I see one test failing that looks related: > `compiler/vectorization/TestFloat16VectorOperations.java` > The run was done with lots of stress flags, probably not all are relevant, and it may be intermittent. > `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1` > > > # Internal Error (.../src/hotspot/share/opto/type.hpp:2234), pid=2401514, tid=2401533 > # assert(_base == FloatCon) failed: Not a Float > > ... > > Current CompileTask: > C2:1885 336 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddFloat16 @ 2 (46 bytes) > > Stack: [0x00007ff300480000,0x00007ff300580000], sp=0x00007ff30057aed0, free space=1003k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xc0358e] ConvF2HFNode::Ideal(PhaseGVN*, bool)+0x88e (type.hpp:2234) > V [libjvm.so+0x181809d] PhaseIterGVN::transform_old(Node*)+0xbd (phaseX.cpp:668) > V [libjvm.so+0x181c705] PhaseIterGVN::optimize()+0xc5 (phaseX.cpp:1054) > V [libjvm.so+0xb498f2] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x722 (loopnode.hpp:1268) > V [libjvm.so+0xb43630] Compile::Optimize()+0xb00 (compile.cpp:2468) > V [libjvm.so+0xb46943] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1f33 (compile.cpp:868) > V [libjvm.so+0x96bdd7] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x467 (c2compiler.cpp:141) > V [libjvm.so+0xb55d78] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb58 (compileBroker.cpp:2323) > V [libjvm.so+0xb56f48] CompileBroker::compiler_thread_loop()+0x578 (compileBroker.cpp:1967) > V [libjvm.so+0x10aa00b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:773) > V [libjvm.so+0x1b0e096] Thread::call_run()+0xb6 (thread.cpp:243) > V [libjvm.so+0x17893f8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:868) Hi @eme64 , let me know if this looks good to land now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2991065799 From roland at openjdk.org Fri Jun 20 11:27:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 20 Jun 2025 11:27:36 GMT Subject: RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account [v7] In-Reply-To: References: <5Ux_6rUt9dtmICBrFavCIGGQp8shno1zYG7mbVtO5zs=.de8b7a55-a452-42df-a9e7-553addfb51a7@github.com> Message-ID: <47nBn-YI6HwVww1Eb0jjwR71ita31XVHGsqSlQYHGjc=.7b74a08b-e10c-4714-85f4-d24b0db7528a@github.com> On Thu, 19 Jun 2025 09:59:49 GMT, Roberto Casta?eda Lozano wrote: >>> * Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later? >> >> Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild. >> >>> * Is this really a regression from [JDK-8281322](https://bugs.openjdk.org/browse/JDK-8281322)? If not, the affects version in JBS should be updated such that we'll consider this for backporting. >> >> I don't think it is. It's an issue that exists since loop strip mining exists AFAICT. I haven't tried how far back the test reproduces it though. > >> > ``` >> > * Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later? >> > ``` >> >> Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild. > > In my opinion, the fix is quite local and contained, so the risk of causing regressions does not seem too high. There is also still quite some time left to observe and react to issues before the RC phase. I would vote for JDK 25. Thanks for the reviews @robcasloz @eme64 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25717#issuecomment-2991084448 From roland at openjdk.org Fri Jun 20 11:27:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 20 Jun 2025 11:27:37 GMT Subject: Integrated: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Tue, 10 Jun 2025 10:17:11 GMT, Roland Westrelin wrote: > `test1()` has a counted loop with a `Store` to `field`. That `Store` > is sunk out of loop. When the `OuterStripMinedLoop` is expanded, only > `Phi`s that exist at the inner loop are added to the outer > loop. There's no `Phi` for the slice of the sunk `Store` (because > there's no `Store` left in the inner loop) so no `Phi` is added for > that slice to the outer loop. As a result, there's a missing anti > dependency for `Load` of `field` that's before the loop and it can be > scheduled inside the outer strip mined loop which is incorrect. > > `test2()` is the same as `test1()` but with a chain of 2 `Store`s. > > `test3()` is another variant where a `Store` is left in the inner loop > after one is sunk out of it so the inner loop still has a `Phi`. As a > result, the outer loop also gets a `Phi` but it's incorrectly wired as > the sunk `Store` should be the input along the backedge but is > not. That one doesn't cause any failure AFAICT. > > The fix I propose is some extra logic at expansion of the > `OuterStripMinedLoop` to handle these corner cases. This pull request has now been integrated. Changeset: c11f36e6 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/c11f36e6200b6c39fd59530f28e9318c4153db49 Stats: 282 lines in 3 files changed: 255 ins; 1 del; 26 mod 8356708: C2: loop strip mining expansion doesn't take sunk stores into account Reviewed-by: rcastanedalo, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25717 From mchevalier at openjdk.org Fri Jun 20 12:03:32 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 12:03:32 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Wed, 18 Jun 2025 14:44:26 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Mostly comments > > src/hotspot/share/opto/multnode.cpp line 126: > >> 124: // Jumping over Tuples: the i-th projection of a Tuple is the i-th input of the Tuple. >> 125: ctrl = ctrl->in(_con); >> 126: } > > Do you need to special-case this here? Why does the `ProjNode::Identity` not suffice? Are there potentially other locations where we now would need this special logic? That is a good question. That is something I picked from Vladimir's implementation and it seemed legitimate. But now you say it, is it needed? Not sure. I'm trying to find that out. Would `::Identity` be enough? It's tempting to say so, right! I'd say it can be not enough if we need `adr_type` before idealizing the `ProjNode` (no idea if that happens). Is there any other places to adapt? One could think so, but actually, I can't find such an example. Other methods of `ProjNode` for instance rely on the type of the input (which is correctly handled in `TupleNode`), and so should already work fine. I'm trying to understand what happens if we don't have that. But maybe @iwanowww would have some helpful insight? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2158797335 From mhaessig at openjdk.org Fri Jun 20 12:06:27 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 20 Jun 2025 12:06:27 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms Everything passed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25910#issuecomment-2991271080 From dnsimon at openjdk.org Fri Jun 20 13:28:32 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 20 Jun 2025 13:28:32 GMT Subject: RFR: 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 In-Reply-To: References: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> Message-ID: On Fri, 20 Jun 2025 08:51:45 GMT, Aleksey Shipilev wrote: >> [JDK-8359064](https://bugs.openjdk.org/browse/JDK-8359064) introduced a new test (CodeInvalidationReasonTest) which triggers a code path in TestHotSpotVMConfig that had apparently never been run on ZGC+AArch64. >> This PR fixes an omission in that code to handle this configuration. > > Marked as reviewed by shade (Reviewer). Thanks for the revie @shipilev . ------------- PR Comment: https://git.openjdk.org/jdk/pull/25911#issuecomment-2991637149 From dnsimon at openjdk.org Fri Jun 20 13:28:33 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 20 Jun 2025 13:28:33 GMT Subject: Integrated: 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 In-Reply-To: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> References: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> Message-ID: On Fri, 20 Jun 2025 08:32:45 GMT, Doug Simon wrote: > [JDK-8359064](https://bugs.openjdk.org/browse/JDK-8359064) introduced a new test (CodeInvalidationReasonTest) which triggers a code path in TestHotSpotVMConfig that had apparently never been run on ZGC+AArch64. > This PR fixes an omission in that code to handle this configuration. This pull request has now been integrated. Changeset: ff54a649 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/ff54a6493a63cfbcaab7ec90c7db0135e98a7f0c Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 Reviewed-by: aph, shade ------------- PR: https://git.openjdk.org/jdk/pull/25911 From epeter at openjdk.org Fri Jun 20 13:38:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 13:38:31 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Fri, 20 Jun 2025 11:18:35 GMT, Jatin Bhateja wrote: >> @jatin-bhateja >> I see one test failing that looks related: >> `compiler/vectorization/TestFloat16VectorOperations.java` >> The run was done with lots of stress flags, probably not all are relevant, and it may be intermittent. >> `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1` >> >> >> # Internal Error (.../src/hotspot/share/opto/type.hpp:2234), pid=2401514, tid=2401533 >> # assert(_base == FloatCon) failed: Not a Float >> >> ... >> >> Current CompileTask: >> C2:1885 336 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddFloat16 @ 2 (46 bytes) >> >> Stack: [0x00007ff300480000,0x00007ff300580000], sp=0x00007ff30057aed0, free space=1003k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xc0358e] ConvF2HFNode::Ideal(PhaseGVN*, bool)+0x88e (type.hpp:2234) >> V [libjvm.so+0x181809d] PhaseIterGVN::transform_old(Node*)+0xbd (phaseX.cpp:668) >> V [libjvm.so+0x181c705] PhaseIterGVN::optimize()+0xc5 (phaseX.cpp:1054) >> V [libjvm.so+0xb498f2] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x722 (loopnode.hpp:1268) >> V [libjvm.so+0xb43630] Compile::Optimize()+0xb00 (compile.cpp:2468) >> V [libjvm.so+0xb46943] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1f33 (compile.cpp:868) >> V [libjvm.so+0x96bdd7] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x467 (c2compiler.cpp:141) >> V [libjvm.so+0xb55d78] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb58 (compileBroker.cpp:2323) >> V [libjvm.so+0xb56f48] CompileBroker::compiler_thread_loop()+0x578 (compileBroker.cpp:1967) >> V [libjvm.so+0x10aa00b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:773) >> V [libjvm.so+0x1b0e096] Thread::call_run()+0xb6 (thread.cpp:243) >> V [libjvm.so+0x17893f8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:868) > > Hi @eme64 , let me know if this looks good to land now. @jatin-bhateja The code looks good to me! I'll run some testing before approving. But someone else could already get started with a second review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-2991682060 From epeter at openjdk.org Fri Jun 20 13:45:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 13:45:32 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Fri, 20 Jun 2025 07:50:54 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/divnode.cpp line 1642: >> >>> 1640: } >>> 1641: assert(projs.catchall_ioproj == nullptr, "no exceptions from floating mod"); >>> 1642: assert(projs.catchall_catchproj == nullptr, "no exceptions from floating mod"); >> >> Why were you able to remove this? > > Pure functions have only control input, data inputs (1 or 2 in practice, so far), control output and data output: they can't alter the memory, they can't throw etc since they are pure. All these output were already unnecessary (which I noticed when working on removing modulo a few months ago), and are now simply not present, so the rewiring is useless. Great! When I remove code like that, then I often just leave a "info" code comment, just to help the reviewers - you don't need to do that, it's just an idea ;) >> src/hotspot/share/opto/multnode.cpp line 177: >> >>> 175: } >>> 176: return this; >>> 177: } >> >> What would happen if we miss to do this optimization? I suppose we would have a Tuple left in the graph and get a bad AD file assert / deopt in product? > > Exactly. Bad AD file. I don't think it's possible at the moment because when we return a tuple node during IGVN instead of a call, the user (all projections) are added to the worklist and they will all skip the tuple. As used for now, a Tuple should appear and disappear in the same IGVN. Yes, I think it is safe. And if we did indeed forget to remove it at some point, we would deopt at the "bad AD file", and that's not horrible. And we would catch it with the corresponding assert. Reasonable :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159031237 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159035323 From epeter at openjdk.org Fri Jun 20 13:48:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 13:48:31 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: <2BVVfe4S925G10VAJBu7KV_Y87aPxByXyfl5AxVJU8o=.5df2a49c-d9dd-4010-b0bc-283698e9e1ec@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> <2BVVfe4S925G10VAJBu7KV_Y87aPxByXyfl5AxVJU8o=.5df2a49c-d9dd-4010-b0bc-283698e9e1ec@github.com> Message-ID: On Fri, 20 Jun 2025 08:10:54 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/multnode.hpp line 113: >> >>> 111: const TypeTuple* _tf; >>> 112: >>> 113: template >> >> Does this need to be a template? Or would a type like `Node*` or `Node` suffice? > > The point is that it's a variadic template. Of course, ideally, I'd like it to be a pack of `Node*` but there isn't a simple way to write that. The upside is that one can write `TupleNode::make(some_type, input1, input2, input3, input4)` with how many input you want in a single construction. I prefer hiding the not nice here, to have compact and readable usages of TupleNode, rather than having every usage look like > > TupleNode* tuple = new TupleNode(some_type); > tuple.set_req(0, input1); > tuple.set_req(1, input2); > tuple.set_req(2, input3); > tuple.set_req(3, input4); Yes, it looks good enough :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159042194 From duke at openjdk.org Fri Jun 20 14:39:04 2025 From: duke at openjdk.org (David Beaumont) Date: Fri, 20 Jun 2025 14:39:04 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework Message-ID: Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). ------------- Commit messages: - Copyright update. - Remove comment. - Move CWT test away from soon-to-be-deleted ImageReader APIs. Changes: https://git.openjdk.org/jdk/pull/25916/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25916&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360131 Stats: 60 lines in 2 files changed: 32 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/25916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25916/head:pull/25916 PR: https://git.openjdk.org/jdk/pull/25916 From kvn at openjdk.org Fri Jun 20 14:39:30 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 20 Jun 2025 14:39:30 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v4] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 11:36:27 GMT, Fei Yang wrote: >> Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. >> >> There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. >> >> And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. >> >> Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. >> >> Testing: >> - [x] Tier1-3 on linux-x64 (release & fastdebug) >> - [x] Tier1-3 on linux-aarch64 (release & fastdebug) >> - [x] Tier1-3 on linux-riscv64 (release) >> - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25765#pullrequestreview-2946407018 From kvn at openjdk.org Fri Jun 20 14:43:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 20 Jun 2025 14:43:27 GMT Subject: RFR: 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 15:15:16 GMT, Manuel H?ssig wrote: > A run of `runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java` with an ubsan enabled binary revealed that passing the value 0 to `Tier(3|4)LoadFeedback`, and `TieredRateUpdateMinTime` lead to division by zero. > > Since `Tier(3|4)LoadFeedback` should disable the scaling of the compilation thresholds, 8bf37ee special cases the 0 case to disable scaling and documents it accordingly. > > 4893b28 sets the lower limit for `TieredRateUpdate(Min|Max)Time` to 1 since the code assumes that at least 1ms passes between each event: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/compiler/compilationPolicy.cpp#L968-L974 > > This PR was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760915006) > - [x] tier1 and 2 plus Oracle internal testing on Oracle supported platforms Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25902#pullrequestreview-2946418057 From kvn at openjdk.org Fri Jun 20 14:50:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 20 Jun 2025 14:50:27 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms Looks good. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25910#pullrequestreview-2946438361 PR Review: https://git.openjdk.org/jdk/pull/25910#pullrequestreview-2946438776 From enikitin at openjdk.org Fri Jun 20 15:05:29 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 20 Jun 2025 15:05:29 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 14:33:49 GMT, David Beaumont wrote: > Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. > This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). > > This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). LGTM ------------- PR Comment: https://git.openjdk.org/jdk/pull/25916#issuecomment-2991954336 From epeter at openjdk.org Fri Jun 20 15:37:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 15:37:31 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Fri, 20 Jun 2025 07:45:30 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/callnode.cpp line 1306: >> >>> 1304: >>> 1305: //============================================================================= >>> 1306: bool CallLeafPureNode::is_unused() const { >> >> Can you add a quick comment why this check implies that the node is not used, i.e. what that means? > > I think i'll need you to explain to me what is unclear at the moment. When I read the function, I see: > "A CallLeafPure is unused iff there is no output result projection." > > I don't see what else to add that is not covered by "if we don't use the result, the pure call is unused", which is exactly the code. Is there any untold hypothesis lurking somewhere that I don't see? It seems to me it uses just very common concepts of C2. The call could have other uses for other projections. Why does this projection make it unused? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159263096 From epeter at openjdk.org Fri Jun 20 15:37:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 15:37:31 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Fri, 20 Jun 2025 15:34:03 GMT, Emanuel Peter wrote: >> I think i'll need you to explain to me what is unclear at the moment. When I read the function, I see: >> "A CallLeafPure is unused iff there is no output result projection." >> >> I don't see what else to add that is not covered by "if we don't use the result, the pure call is unused", which is exactly the code. Is there any untold hypothesis lurking somewhere that I don't see? It seems to me it uses just very common concepts of C2. > > The call could have other uses for other projections. Why does this projection make it unused? I suppose I was not aware that `TypeFunc::Parms` stands for result projection.... the name does not make it immediately apparent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159265635 From liach at openjdk.org Fri Jun 20 15:50:32 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 20 Jun 2025 15:50:32 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 14:33:49 GMT, David Beaumont wrote: > Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. > This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). > > This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). I think ctw is in hs-tier2 tests. You may run them on the CI for sanity checking. test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/ClassPathJimageEntry.java line 82: > 80: } > 81: > 82: //private final ImageReader reader; Suggestion: test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/ClassPathJimageEntry.java line 114: > 112: } > 113: } catch (IOException e) { > 114: throw new RuntimeException(e); Maybe throw an error for consistency with other methods in this class? test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Utils.java line 209: > 207: * @return corresponding filename > 208: * @throws AssertionError if filename isn't valid filename for class file - > 209: * {@link #isClassFile(String)} Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/25916#pullrequestreview-2946659413 PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159280982 PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159283448 PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159284146 From duke at openjdk.org Fri Jun 20 15:55:36 2025 From: duke at openjdk.org (David Beaumont) Date: Fri, 20 Jun 2025 15:55:36 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 15:45:18 GMT, Chen Liang wrote: >> Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. >> This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). >> >> This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). > > test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/ClassPathJimageEntry.java line 82: > >> 80: } >> 81: >> 82: //private final ImageReader reader; > > Suggestion: *doh* - thanks. > test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/ClassPathJimageEntry.java line 114: > >> 112: } >> 113: } catch (IOException e) { >> 114: throw new RuntimeException(e); > > Maybe throw an error for consistency with other methods in this class? Personally I don't like `Error` as a response to runtime issues like this, but consistency is probably good, so done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159291998 PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159293103 From epeter at openjdk.org Fri Jun 20 15:57:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 15:57:34 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 08:16:11 GMT, Marc Chevalier wrote: >> A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. >> >> This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Mostly comments @marc-chevalier Thanks for addressing my comments! I now have a few more for you :) src/hotspot/share/opto/callnode.cpp line 1318: > 1316: * such a tuple, we let output Proj's idealization pick the corresponding input of the > 1317: * pure call, so jumping over it, and effectively, removing the call from the graph. > 1318: * This avoids doing the graph surgery manually, but leave that to IGVN Suggestion: * This avoids doing the graph surgery manually, but leaves that to IGVN src/hotspot/share/opto/callnode.cpp line 1341: > 1339: } > 1340: > 1341: Node* CallLeafPureNode::Ideal(PhaseGVN* phase, bool can_reshape) { Did you make sure that this method is called from all its subclasses? It seems to me that you just copied the code to the subclasses, rather than calling this method, am I right? src/hotspot/share/opto/callnode.cpp line 1342: > 1340: > 1341: Node* CallLeafPureNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 1342: if (is_dead()) { // dead node Suggestion: if (is_dead()) { The comment seemed redundant. You could say who else is responsible of cleaning up the dead node though. What would happen if the `CallLeafPureNode` loses its control projection, but not the other uses? I don't even know if that is possible. What do you think? src/hotspot/share/opto/callnode.hpp line 916: > 914: }; > 915: > 916: class CallLeafPureNode : public CallLeafNode { You need a short description about what this node is for. What are the assumptions about it? src/hotspot/share/opto/divnode.cpp line 1522: > 1520: if (!can_reshape) { > 1521: return nullptr; > 1522: } Would this prevent us from doing the `make_tuple_of_input_state_and_top_return_values` trick? Because it seems to me that we do not need to reshape the node for that, right? Maybe you should reorder things for that? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25760#pullrequestreview-2946648813 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159274378 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159292773 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159277636 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159287237 PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159289092 From mchevalier at openjdk.org Fri Jun 20 15:57:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 15:57:36 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Fri, 20 Jun 2025 15:35:04 GMT, Emanuel Peter wrote: >> The call could have other uses for other projections. Why does this projection make it unused? > > I suppose I was not aware that `TypeFunc::Parms` stands for result projection.... the name does not make it immediately apparent. I see. I think I'd be better to comment on the declaration of the class then. Something saying that CallLeafPureNode represents calls that are pure: they only have data input and output data (and control for practical reasons for now), no exceptions, no memory, no safepoint... They can be freely be moved around, duplicated or, if the result isn't used, removed. Then that explains... a lot of what we are doing, not just `is_unused`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159294140 From epeter at openjdk.org Fri Jun 20 15:57:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 15:57:37 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: <7m83zrjdB-g0fkfOUN_8MtvbUV4XEAl_d0ErDFXo9To=.a5aff34c-192c-41ee-9a41-ca23bc10e73c@github.com> On Fri, 20 Jun 2025 07:37:31 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/callnode.cpp line 1335: >> >>> 1333: if (can_reshape && is_unused()) { >>> 1334: return make_tuple_of_input_state_and_top_return_values(phase->C); >>> 1335: } >> >> Can you add a code comment what this does? > > It's better commented at the definition of `make_tuple_of_input_state_and_top_return_values` now. Let's not bloat the each call site with the same wall of text again. Suggestion: if (can_reshape && is_unused()) { // The result is not used. We remove the call by replacing it with a tuple, that // is later desintegrated by the projections. return make_tuple_of_input_state_and_top_return_values(phase->C); } I suggest this. It is short, and allows you to quickly understand what is happening without having to go to the other side. Alternatively, you could also rename `make_tuple_of_input_state_and_top_return_values`. Maybe: `make_tuple_for_delayed_desintegration` ? Maybe someone has an even better idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159286184 From epeter at openjdk.org Fri Jun 20 15:57:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 15:57:38 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 15:50:46 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Mostly comments > > src/hotspot/share/opto/divnode.cpp line 1522: > >> 1520: if (!can_reshape) { >> 1521: return nullptr; >> 1522: } > > Would this prevent us from doing the `make_tuple_of_input_state_and_top_return_values` trick? Because it seems to me that we do not need to reshape the node for that, right? Maybe you should reorder things for that? Also, you should probably call `CallLeafPureNode::Ideal` instead of duplicating its logic here and in other subclasses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159293669 From duke at openjdk.org Fri Jun 20 16:02:17 2025 From: duke at openjdk.org (David Beaumont) Date: Fri, 20 Jun 2025 16:02:17 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework [v2] In-Reply-To: References: Message-ID: > Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. > This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). > > This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). David Beaumont has updated the pull request incrementally with one additional commit since the last revision: Feedback tweaks. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25916/files - new: https://git.openjdk.org/jdk/pull/25916/files/c1a49fe6..ae2523d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25916&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25916&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25916/head:pull/25916 PR: https://git.openjdk.org/jdk/pull/25916 From epeter at openjdk.org Fri Jun 20 16:01:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 20 Jun 2025 16:01:51 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Fri, 20 Jun 2025 15:54:20 GMT, Marc Chevalier wrote: >> I suppose I was not aware that `TypeFunc::Parms` stands for result projection.... the name does not make it immediately apparent. > > I see. I think I'd be better to comment on the declaration of the class then. Something saying that CallLeafPureNode represents calls that are pure: they only have data input and output data (and control for practical reasons for now), no exceptions, no memory, no safepoint... They can be freely be moved around, duplicated or, if the result isn't used, removed. Then that explains... a lot of what we are doing, not just `is_unused`. I think I was really just confused about the `Parms`. I thought that means parameters .. and not results :rofl: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159299985 From duke at openjdk.org Fri Jun 20 16:02:18 2025 From: duke at openjdk.org (David Beaumont) Date: Fri, 20 Jun 2025 16:02:18 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework [v2] In-Reply-To: References: Message-ID: <-spZURlJkECYv4Jtw531ut1EXeCRrJd9pqY5JTxFghs=.3981f6aa-528c-4b92-817b-838e9a2ef22e@github.com> On Fri, 20 Jun 2025 15:47:07 GMT, Chen Liang wrote: >> David Beaumont has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback tweaks. > > test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Utils.java line 209: > >> 207: * @return corresponding filename >> 208: * @throws AssertionError if filename isn't valid filename for class file - >> 209: * {@link #isClassFile(String)} > > Suggestion: What's the suggestion here? I see no change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159296411 From duke at openjdk.org Fri Jun 20 16:02:18 2025 From: duke at openjdk.org (David Beaumont) Date: Fri, 20 Jun 2025 16:02:18 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework [v2] In-Reply-To: <-spZURlJkECYv4Jtw531ut1EXeCRrJd9pqY5JTxFghs=.3981f6aa-528c-4b92-817b-838e9a2ef22e@github.com> References: <-spZURlJkECYv4Jtw531ut1EXeCRrJd9pqY5JTxFghs=.3981f6aa-528c-4b92-817b-838e9a2ef22e@github.com> Message-ID: On Fri, 20 Jun 2025 15:55:52 GMT, David Beaumont wrote: >> test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Utils.java line 209: >> >>> 207: * @return corresponding filename >>> 208: * @throws AssertionError if filename isn't valid filename for class file - >>> 209: * {@link #isClassFile(String)} >> >> Suggestion: > > What's the suggestion here? I see no change. *doh* you meant just to delete it. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25916#discussion_r2159298567 From duke at openjdk.org Fri Jun 20 16:04:35 2025 From: duke at openjdk.org (David Beaumont) Date: Fri, 20 Jun 2025 16:04:35 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework [v2] In-Reply-To: References: Message-ID: <53SD3XdAur8XAPyfmPsvhp1rmekinYGYqN94AZVbKqw=.a4df79c8-198e-4f06-8963-cc67a08a1563@github.com> On Fri, 20 Jun 2025 15:47:59 GMT, Chen Liang wrote: > I think ctw is in hs-tier2 tests. You may run them on the CI for sanity checking. The test group `applications/ctw` passes locally, and I've set off a CI test run with that target. Do you think it also needs `tier2` in that case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25916#issuecomment-2992143761 From mchevalier at openjdk.org Fri Jun 20 16:06:32 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 20 Jun 2025 16:06:32 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: On Fri, 20 Jun 2025 15:58:26 GMT, Emanuel Peter wrote: >> I see. I think I'd be better to comment on the declaration of the class then. Something saying that CallLeafPureNode represents calls that are pure: they only have data input and output data (and control for practical reasons for now), no exceptions, no memory, no safepoint... They can be freely be moved around, duplicated or, if the result isn't used, removed. Then that explains... a lot of what we are doing, not just `is_unused`. > > I think I was really just confused about the `Parms`. I thought that means parameters .. and not results :rofl: It's actually both. For functions, the parameters are starting at `Parms`, and the results too. Before that, it's all the special stuff: control, memory, io... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159307056 From bmaillard at openjdk.org Fri Jun 20 16:39:51 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 20 Jun 2025 16:39:51 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v3] In-Reply-To: References: Message-ID: <38Q5LqxUlyIBHyusziq461sTdVCBm0ZO3kcVB_u7I18=.11da0a73-7a4f-4c83-abeb-f12791ad7741@github.com> > This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) > - [x] tier1-3, plus some internal testing > - [x] Manual testing with values known to previously cause undefined behavior > > Thanks! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8356865: Change assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25834/files - new: https://git.openjdk.org/jdk/pull/25834/files/c8904a29..97f52b45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25834&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25834&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25834/head:pull/25834 PR: https://git.openjdk.org/jdk/pull/25834 From bmaillard at openjdk.org Fri Jun 20 16:43:27 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 20 Jun 2025 16:43:27 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v3] In-Reply-To: References: <8nXpApdLxXidwKfFpcVbKjpYgOn5EfhUvKNQRKvv2o0=.252bc291-3219-4d77-9a4d-8fe75952c2f6@github.com> Message-ID: On Wed, 18 Jun 2025 19:42:07 GMT, Evgeny Astigeevich wrote: >> Thanks for the comments! >> >> I added the assert because the issue in the JBS mentioned a specific case where we ended up with negative values. >> >> Should I leave it like this, or rather convert it to a more specific check (ie. making sure that the `LogBytesPerLong - log2_esize` most significant bits are not used **before** shifting)? > > IMO your assert is obfuscating the overflow problem. > I think the assert should be before doing the shift. > It can be like: > > assert((fast_size_limit == 0) || (count_leading_zeros(fast_size_limit) > (LogBytesPerLong - log2_esize), "fast_size_limit (%d) overflow when shifted left by %d", fast_size_limit, (LogBytesPerLong - log2_esize)); Thanks for the tip, I made the requested changes! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2159359020 From liach at openjdk.org Fri Jun 20 16:53:29 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 20 Jun 2025 16:53:29 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework [v2] In-Reply-To: References: Message-ID: <3gyYQnyq_JJjPb9cZj_bin1lBhtGsfl-6dzum4Yc4CA=.cd1ac0bd-fbfa-4744-a593-ac90addea666@github.com> On Fri, 20 Jun 2025 16:02:17 GMT, David Beaumont wrote: >> Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. >> This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). >> >> This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). > > David Beaumont has updated the pull request incrementally with one additional commit since the last revision: > > Feedback tweaks. The Java code looks good. Let's wait for hotspot reviewers. I think the tests you've run are sufficient. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25916#pullrequestreview-2946804433 From vlivanov at openjdk.org Fri Jun 20 19:16:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 20 Jun 2025 19:16:29 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> Message-ID: <7I4F9dJkYX6b66vnJfhJLvvjnC-aw0xA3X2eimh8kBg=.0d11e624-4c82-4d7e-8392-e3274a76478a@github.com> On Fri, 20 Jun 2025 12:00:52 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/multnode.cpp line 126: >> >>> 124: // Jumping over Tuples: the i-th projection of a Tuple is the i-th input of the Tuple. >>> 125: ctrl = ctrl->in(_con); >>> 126: } >> >> Do you need to special-case this here? Why does the `ProjNode::Identity` not suffice? Are there potentially other locations where we now would need this special logic? > > That is a good question. That is something I picked from Vladimir's implementation and it seemed legitimate. But now you say it, is it needed? Not sure. I'm trying to find that out. Would `::Identity` be enough? It's tempting to say so, right! I'd say it can be not enough if we need `adr_type` before idealizing the `ProjNode` (no idea if that happens). Is there any other places to adapt? One could think so, but actually, I can't find such an example. Other methods of `ProjNode` for instance rely on the type of the input (which is correctly handled in `TupleNode`), and so should already work fine. > > I'm trying to understand what happens if we don't have that. But maybe @iwanowww would have some helpful insight? I don't remember all the details now, but there were some problems when `ProjNode::adr_type()` encounters `TupleNode`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2159543440 From duke at openjdk.org Fri Jun 20 21:54:45 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 20 Jun 2025 21:54:45 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: > [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) > > The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) > > The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove > > Additional Testing: > - [x] Linux aarch64 fastdebug tier 1 > - [x] Linux aarch64 fastdebug tier 2 > - [x] Linux aarch64 fastdebug tier 3 > - [x] Linux aarch64 fastdebug tier 4 Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Remove r1 save ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25906/files - new: https://git.openjdk.org/jdk/pull/25906/files/ae31d0fb..c2d5ef37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25906&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25906&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25906/head:pull/25906 PR: https://git.openjdk.org/jdk/pull/25906 From wenanjian at openjdk.org Sat Jun 21 12:01:25 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sat, 21 Jun 2025 12:01:25 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: <7mGNHMgjdO5qbkzhz2Ivjhkgl9TkWYNoj76SEQc83MQ=.65909171-5ef8-4913-ae23-260058599b24@github.com> > Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv > > The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: remove unuse x11 Register ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25848/files - new: https://git.openjdk.org/jdk/pull/25848/files/b07869e4..741fb797 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25848&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25848&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25848/head:pull/25848 PR: https://git.openjdk.org/jdk/pull/25848 From dzhang at openjdk.org Sun Jun 22 07:01:48 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Sun, 22 Jun 2025 07:01:48 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed Message-ID: Hi all, As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. Please take a look and have some reviews. Thanks a lot. ------------- Commit messages: - 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed Changes: https://git.openjdk.org/jdk/pull/25924/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25924&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360169 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25924.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25924/head:pull/25924 PR: https://git.openjdk.org/jdk/pull/25924 From fyang at openjdk.org Mon Jun 23 00:15:33 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Jun 2025 00:15:33 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: <7mGNHMgjdO5qbkzhz2Ivjhkgl9TkWYNoj76SEQc83MQ=.65909171-5ef8-4913-ae23-260058599b24@github.com> References: <7mGNHMgjdO5qbkzhz2Ivjhkgl9TkWYNoj76SEQc83MQ=.65909171-5ef8-4913-ae23-260058599b24@github.com> Message-ID: On Sat, 21 Jun 2025 12:01:25 GMT, Anjian Wen wrote: >> Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv >> >> The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > remove unuse x11 Register Still good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25848#pullrequestreview-2948394724 From fyang at openjdk.org Mon Jun 23 00:28:28 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Jun 2025 00:28:28 GMT Subject: RFR: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call [v4] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 14:37:01 GMT, Vladimir Kozlov wrote: > Good. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25765#issuecomment-2994569258 From fyang at openjdk.org Mon Jun 23 00:36:36 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Jun 2025 00:36:36 GMT Subject: Integrated: 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 01:54:34 GMT, Fei Yang wrote: > Hi, please consider this change fixing alignment check when emitting arraycopy runtime call. > > There are currently four callsites of `StubRoutines::select_arraycopy_function` in hotspot C2 shared code where we emit arraycopy runtime calls [1-4]. Three of them [2-4] missed base offset when calculation alignment for both source and destination array addresses. Seems they assume a base offset of 8 bytes, which is not always true. Base offset becomes 4 bytes under either `-XX:+UseCompactObjectHeaders` or `-XX:-UseCompressedClassPointers`. > > And `StubRoutines::select_arraycopy_function` selects the right arraycopy runtime call based on this alignment. As a result, it could see an incorrect `aligned` param about the array addresses and thus a wrong arraycopy runtime call is selected. This is causing performance regressions (like Dacapo Spring) on some linux-riscv64 platforms like Sifive Unmatched or Premier P550 SBCs where misaligned memory access is very slow. > > Proposed change fixes this issue by taking base offset into account when checking the alignment, which is very similar to [1]. > > Testing: > - [x] Tier1-3 on linux-x64 (release & fastdebug) > - [x] Tier1-3 on linux-aarch64 (release & fastdebug) > - [x] Tier1-3 on linux-riscv64 (release) > - [x] Dacapo spring performance test on linux-riscv64 (w/wo `-XX:+UseCompactObjectHeaders` / `-XX:-UseCompressedClassPointers`) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macroArrayCopy.cpp#L341 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1584 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1666 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/stringopts.cpp#L1481 This pull request has now been integrated. Changeset: 6b439391 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/6b4393917ae689818d67fcaf9cc61ca16ea6d426 Stats: 136 lines in 3 files changed: 128 ins; 0 del; 8 mod 8359270: C2: alignment check should consider base offset when emitting arraycopy runtime call Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25765 From fyang at openjdk.org Mon Jun 23 00:47:35 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 23 Jun 2025 00:47:35 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25924#pullrequestreview-2948413524 From fjiang at openjdk.org Mon Jun 23 01:12:42 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 23 Jun 2025 01:12:42 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25924#pullrequestreview-2948430770 From duke at openjdk.org Mon Jun 23 02:29:33 2025 From: duke at openjdk.org (duke) Date: Mon, 23 Jun 2025 02:29:33 GMT Subject: RFR: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: <7mGNHMgjdO5qbkzhz2Ivjhkgl9TkWYNoj76SEQc83MQ=.65909171-5ef8-4913-ae23-260058599b24@github.com> References: <7mGNHMgjdO5qbkzhz2Ivjhkgl9TkWYNoj76SEQc83MQ=.65909171-5ef8-4913-ae23-260058599b24@github.com> Message-ID: On Sat, 21 Jun 2025 12:01:25 GMT, Anjian Wen wrote: >> Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv >> >> The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > remove unuse x11 Register @Anjian-Wen Your change (at version 741fb7971c94b9d4c5306bf6409e8deb4f5951ee) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25848#issuecomment-2994712024 From wenanjian at openjdk.org Mon Jun 23 02:34:37 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 23 Jun 2025 02:34:37 GMT Subject: Integrated: 8359801: RISC-V: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 08:41:41 GMT, Anjian Wen wrote: > Do the same thing as [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) but for riscv > > The Interpreter::profile_taken_branch has the same sbbptr pattern with [JDK-8358105](https://bugs.openjdk.org/browse/JDK-8358105)?The counter is 64-bit, never practically overflows , and no other code cares about it. so we can remove the overflows check This pull request has now been integrated. Changeset: 620df7ec Author: Anjian Wen Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/620df7ec348598580884e3b9d45066495f0c40e5 Stats: 19 lines in 3 files changed: 0 ins; 13 del; 6 mod 8359801: RISC-V: Simplify Interpreter::profile_taken_branch Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/25848 From qxing at openjdk.org Mon Jun 23 03:36:13 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 23 Jun 2025 03:36:13 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise Message-ID: The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: public static int numberOfNibbles(int i) { int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); return Math.max((mag + 3) / 4, 1); } Testing: tier1, IR test ------------- Commit messages: - Make the type of count leading/trailing zero nodes more precise Changes: https://git.openjdk.org/jdk/pull/25928/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360192 Stats: 127 lines in 3 files changed: 123 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From thartmann at openjdk.org Mon Jun 23 05:07:28 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Jun 2025 05:07:28 GMT Subject: RFR: 8360131: Remove use of soon-to-be-removed APIs by CTW framework [v2] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 16:02:17 GMT, David Beaumont wrote: >> Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. >> This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). >> >> This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). > > David Beaumont has updated the pull request incrementally with one additional commit since the last revision: > > Feedback tweaks. Looks good to me but since variants of CTW are executed in tier3 as well, I would recommend running tier1-3 before integrating this. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25916#pullrequestreview-2948671168 From syan at openjdk.org Mon Jun 23 05:45:32 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 23 Jun 2025 05:45:32 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. Marked as reviewed by syan (Committer). GHA report configure fails 'Could not find a C compiler' on windows-x64 and windows-aarch64, I think it's unrelated to this PR. ------------- PR Review: https://git.openjdk.org/jdk/pull/25924#pullrequestreview-2948725218 PR Comment: https://git.openjdk.org/jdk/pull/25924#issuecomment-2995004052 From qxing at openjdk.org Mon Jun 23 06:03:52 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 23 Jun 2025 06:03:52 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v2] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Use `BitsPerX` constant instead of `sizeof` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/0faa2099..0d0bb579 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qamai at openjdk.org Mon Jun 23 06:03:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 23 Jun 2025 06:03:52 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 03:31:36 GMT, Qizheng Xing wrote: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test A stricter bound would be `TypeInt::make(~t._bits._zeros, t._bits._ones, t._widen)` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-2995036343 From epeter at openjdk.org Mon Jun 23 06:19:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 06:19:29 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v3] In-Reply-To: <38Q5LqxUlyIBHyusziq461sTdVCBm0ZO3kcVB_u7I18=.11da0a73-7a4f-4c83-abeb-f12791ad7741@github.com> References: <38Q5LqxUlyIBHyusziq461sTdVCBm0ZO3kcVB_u7I18=.11da0a73-7a4f-4c83-abeb-f12791ad7741@github.com> Message-ID: On Fri, 20 Jun 2025 16:39:51 GMT, Beno?t Maillard wrote: >> This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) >> - [x] tier1-3, plus some internal testing >> - [x] Manual testing with values known to previously cause undefined behavior >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8356865: Change assert test/hotspot/jtreg/compiler/arguments/TestFastAllocateSizeLimit.java line 48: > 46: public static void main(String[] args) throws IOException { > 47: if (args.length == 0) { > 48: int sizeLimit = RANDOM.nextInt(1 << 28); Can you please add a quick comment why you chose `1 << 28`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2160796211 From qxing at openjdk.org Mon Jun 23 06:22:27 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 23 Jun 2025 06:22:27 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 06:00:37 GMT, Quan Anh Mai wrote: > A stricter bound would be `TypeInt::make(~t._bits._zeros, t._bits._ones, t._widen)` @merykitty Thanks for your review, did you mean `TypeInt::make(clz(~t._bits._zeros), clz(t._bits._ones), t._widen)`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-2995077221 From mchevalier at openjdk.org Mon Jun 23 06:43:31 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 23 Jun 2025 06:43:31 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: <7I4F9dJkYX6b66vnJfhJLvvjnC-aw0xA3X2eimh8kBg=.0d11e624-4c82-4d7e-8392-e3274a76478a@github.com> References: <2f8GCEXnY2mwaz9N9dQK_APi0A31z8Y7CM6YPms4Pp0=.a0651468-def0-4101-94bb-8fabd6d79b1b@github.com> <7I4F9dJkYX6b66vnJfhJLvvjnC-aw0xA3X2eimh8kBg=.0d11e624-4c82-4d7e-8392-e3274a76478a@github.com> Message-ID: On Fri, 20 Jun 2025 19:14:05 GMT, Vladimir Ivanov wrote: >> That is a good question. That is something I picked from Vladimir's implementation and it seemed legitimate. But now you say it, is it needed? Not sure. I'm trying to find that out. Would `::Identity` be enough? It's tempting to say so, right! I'd say it can be not enough if we need `adr_type` before idealizing the `ProjNode` (no idea if that happens). Is there any other places to adapt? One could think so, but actually, I can't find such an example. Other methods of `ProjNode` for instance rely on the type of the input (which is correctly handled in `TupleNode`), and so should already work fine. >> >> I'm trying to understand what happens if we don't have that. But maybe @iwanowww would have some helpful insight? > > I don't remember all the details now, but there were some problems when `ProjNode::adr_type()` encounters `TupleNode`. I've run some test without this special handling and I couldn't see nothing more failing. Maybe it was more necessary in Vladimir's original usecase (reachability). In my case, maybe I don't need `adr_type` to handle nicely `TupleNode`s, but let's not set a trap for the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2160830634 From mhaessig at openjdk.org Mon Jun 23 06:50:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 06:50:32 GMT Subject: RFR: 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 15:15:16 GMT, Manuel H?ssig wrote: > A run of `runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java` with an ubsan enabled binary revealed that passing the value 0 to `Tier(3|4)LoadFeedback`, and `TieredRateUpdateMinTime` lead to division by zero. > > Since `Tier(3|4)LoadFeedback` should disable the scaling of the compilation thresholds, 8bf37ee special cases the 0 case to disable scaling and documents it accordingly. > > 4893b28 sets the lower limit for `TieredRateUpdate(Min|Max)Time` to 1 since the code assumes that at least 1ms passes between each event: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/compiler/compilationPolicy.cpp#L968-L974 > > This PR was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760915006) > - [x] tier1 and 2 plus Oracle internal testing on Oracle supported platforms Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25902#issuecomment-2995145410 From duke at openjdk.org Mon Jun 23 06:50:32 2025 From: duke at openjdk.org (duke) Date: Mon, 23 Jun 2025 06:50:32 GMT Subject: RFR: 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 15:15:16 GMT, Manuel H?ssig wrote: > A run of `runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java` with an ubsan enabled binary revealed that passing the value 0 to `Tier(3|4)LoadFeedback`, and `TieredRateUpdateMinTime` lead to division by zero. > > Since `Tier(3|4)LoadFeedback` should disable the scaling of the compilation thresholds, 8bf37ee special cases the 0 case to disable scaling and documents it accordingly. > > 4893b28 sets the lower limit for `TieredRateUpdate(Min|Max)Time` to 1 since the code assumes that at least 1ms passes between each event: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/compiler/compilationPolicy.cpp#L968-L974 > > This PR was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760915006) > - [x] tier1 and 2 plus Oracle internal testing on Oracle supported platforms @mhaessig Your change (at version 4893b288e4f69e03fdc52281603b4718b3668cb2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25902#issuecomment-2995151043 From thartmann at openjdk.org Mon Jun 23 06:52:37 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Jun 2025 06:52:37 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: <_r9O6tLaOT6j05Pz-EqQZrkdCsYasmRtAzHLPfS8iAs=.eefcfa03-4c05-4319-ae38-f12530394ef0@github.com> On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms Looks good to me. Thanks for fixing this! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25910#pullrequestreview-2948856227 From mhaessig at openjdk.org Mon Jun 23 06:52:37 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 06:52:37 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25910#issuecomment-2995152988 From duke at openjdk.org Mon Jun 23 06:52:38 2025 From: duke at openjdk.org (duke) Date: Mon, 23 Jun 2025 06:52:38 GMT Subject: RFR: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms @mhaessig Your change (at version 83e2a186a0f925cdce26f4610ddadbd41b679c89) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25910#issuecomment-2995154173 From mhaessig at openjdk.org Mon Jun 23 06:58:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 06:58:08 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v5] In-Reply-To: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: > Running > > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > > on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. > > This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) > - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8354727-policy - Fix merge conflict resolution - Merge branch 'master' into JDK-8354727-policy - Calculate buffer size correctly for c2_only Co-authored-by: Aleksey Shipilev - Caclulate how many compiler buffers fit into NonNMethodCodeHeap - Clarify endif - update copyrights - remove leftover include - fix whitebox access to code cache size configs - VMPageSizeConstraintFunc - ... and 6 more: https://git.openjdk.org/jdk/compare/de34bb8e...689b4ea8 ------------- Changes: https://git.openjdk.org/jdk/pull/25872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=04 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25872/head:pull/25872 PR: https://git.openjdk.org/jdk/pull/25872 From bmaillard at openjdk.org Mon Jun 23 07:09:15 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 23 Jun 2025 07:09:15 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v4] In-Reply-To: References: Message-ID: > This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) > - [x] tier1-3, plus some internal testing > - [x] Manual testing with values known to previously cause undefined behavior > > Thanks! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: 8356865: Add comment for range in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25834/files - new: https://git.openjdk.org/jdk/pull/25834/files/97f52b45..8241b218 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25834&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25834&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25834/head:pull/25834 PR: https://git.openjdk.org/jdk/pull/25834 From bmaillard at openjdk.org Mon Jun 23 07:09:15 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 23 Jun 2025 07:09:15 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v3] In-Reply-To: References: <38Q5LqxUlyIBHyusziq461sTdVCBm0ZO3kcVB_u7I18=.11da0a73-7a4f-4c83-abeb-f12791ad7741@github.com> Message-ID: <5SI82szo_LQNH0Uhl-1-8tN1rISzF9zTGDs4PM7yU9Y=.542ef0d0-1b62-4b3e-9714-b2a61cbf358c@github.com> On Mon, 23 Jun 2025 06:16:54 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: >> >> 8356865: Change assert > > test/hotspot/jtreg/compiler/arguments/TestFastAllocateSizeLimit.java line 48: > >> 46: public static void main(String[] args) throws IOException { >> 47: if (args.length == 0) { >> 48: int sizeLimit = RANDOM.nextInt(1 << 28); > > Can you please add a quick comment why you chose `1 << 28`? Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25834#discussion_r2160871327 From roland at openjdk.org Mon Jun 23 07:24:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 23 Jun 2025 07:24:43 GMT Subject: [jdk25] RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account Message-ID: Hi all, This pull request contains a backport of commit [c11f36e6](https://github.com/openjdk/jdk/commit/c11f36e6200b6c39fd59530f28e9318c4153db49) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 20 Jun 2025 and was reviewed by Roberto Casta?eda Lozano and Emanuel Peter. Thanks! ------------- Commit messages: - Backport c11f36e6200b6c39fd59530f28e9318c4153db49 Changes: https://git.openjdk.org/jdk/pull/25929/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25929&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356708 Stats: 282 lines in 3 files changed: 255 ins; 1 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25929.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25929/head:pull/25929 PR: https://git.openjdk.org/jdk/pull/25929 From epeter at openjdk.org Mon Jun 23 07:36:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 07:36:30 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v4] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 07:09:15 GMT, Beno?t Maillard wrote: >> This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) >> - [x] tier1-3, plus some internal testing >> - [x] Manual testing with values known to previously cause undefined behavior >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8356865: Add comment for range in test Thanks for the updates! Nice work :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25834#pullrequestreview-2948981657 From mchevalier at openjdk.org Mon Jun 23 07:39:32 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 23 Jun 2025 07:39:32 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: Message-ID: <7MJVRaO_nQSfwpl874EU71b63Du-sNvU4X034Jml6IY=.3bfdbe0a-4e8c-4e6a-b5b3-8ee02939a878@github.com> On Fri, 20 Jun 2025 15:42:53 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Mostly comments > > src/hotspot/share/opto/callnode.cpp line 1342: > >> 1340: >> 1341: Node* CallLeafPureNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 1342: if (is_dead()) { // dead node > > Suggestion: > > if (is_dead()) { > > The comment seemed redundant. You could say who else is responsible of cleaning up the dead node though. > > What would happen if the `CallLeafPureNode` loses its control projection, but not the other uses? I don't even know if that is possible. What do you think? IGVN takes care of removing a node without output. This was motivated by https://bugs.openjdk.org/browse/JDK-8353341 I think @TobiHartmann told me it's common not to touch dead nodes during idealization. I think it is possible to lose control projection but not data because data was already found dead, but data wasn't yet (but should happen shortly after). As soon as the data projection disappear, the node should be removed. `remove_dead_node` (used in IGVN) is aggressively removing a dead node and all the usages that become recursively dead. It's the classic constraint of having to find that both data and control are top when one is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2160923801 From duke at openjdk.org Mon Jun 23 07:40:32 2025 From: duke at openjdk.org (duke) Date: Mon, 23 Jun 2025 07:40:32 GMT Subject: RFR: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB [v4] In-Reply-To: References: Message-ID: <96LNXhcElufKdZY863SlC9Mb3u9tDjjr_-JCRNJjUrw=.47f3f82d-b1a4-4a87-873b-78dbeaa5b1f9@github.com> On Mon, 23 Jun 2025 07:09:15 GMT, Beno?t Maillard wrote: >> This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. >> >> ### Testing >> - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) >> - [x] tier1-3, plus some internal testing >> - [x] Manual testing with values known to previously cause undefined behavior >> >> Thanks! > > Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: > > 8356865: Add comment for range in test @benoitmaillard Your change (at version 8241b2188b2f8334439f3824fb535ce29091eb37) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25834#issuecomment-2995283731 From qamai at openjdk.org Mon Jun 23 07:43:28 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 23 Jun 2025 07:43:28 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: References: Message-ID: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> On Mon, 23 Jun 2025 06:20:11 GMT, Qizheng Xing wrote: >> A stricter bound would be `TypeInt::make(~t._bits._zeros, t._bits._ones, t._widen)` > >> A stricter bound would be `TypeInt::make(~t._bits._zeros, t._bits._ones, t._widen)` > > @merykitty Thanks for your review, did you mean `TypeInt::make(clz(~t._bits._zeros), clz(t._bits._ones), t._widen)`? @MaxXSoft Yes you are right, my mistake ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-2995295061 From dnsimon at openjdk.org Mon Jun 23 07:47:33 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Jun 2025 07:47:33 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25924#pullrequestreview-2949010465 From mhaessig at openjdk.org Mon Jun 23 07:50:35 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 07:50:35 GMT Subject: Integrated: 8355276: Sort C2 includes In-Reply-To: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> References: <_ab90h2O6v67rorpT12dUxMO2mv7GscLEMo37dLJkhA=.c645e9e0-8b99-49f8-8aa5-80a623b541d6@github.com> Message-ID: On Fri, 20 Jun 2025 08:17:24 GMT, Manuel H?ssig wrote: > This PR sorts the includes in `hotspot/share/opto` using `test/hotspot/jtreg/sources/SortIncludes.java` and enforces sorted includes for C2 sources in `sources/TestIncludesAreSorted.java`. > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15773777177) > - [x] tier1,tier2 plus Oracle internal testing on Oracle supported platforms This pull request has now been integrated. Changeset: 9ae39b62 Author: Manuel H?ssig Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/9ae39b62b91ffacc6473534d96679f3282c612cc Stats: 119 lines in 40 files changed: 57 ins; 55 del; 7 mod 8355276: Sort C2 includes Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25910 From roland at openjdk.org Mon Jun 23 07:51:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 23 Jun 2025 07:51:46 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions [v5] In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - more - more - Merge branch 'master' into JDK-8275202 - review - Update src/hotspot/share/opto/loopConditionalPropagation.cpp Co-authored-by: Roberto Casta?eda Lozano - more - updated conditional propagation - Merge branch 'master' into JDK-8275202 - conditional propagation ------------- Changes: https://git.openjdk.org/jdk/pull/14586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=04 Stats: 4593 lines in 34 files changed: 4482 ins; 40 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/14586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14586/head:pull/14586 PR: https://git.openjdk.org/jdk/pull/14586 From bmaillard at openjdk.org Mon Jun 23 07:54:37 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 23 Jun 2025 07:54:37 GMT Subject: Integrated: 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB In-Reply-To: References: Message-ID: On Mon, 16 Jun 2025 14:50:46 GMT, Beno?t Maillard wrote: > This PR adds a range constraint for the `-XX:FastAllocateSizeLimit` debug flag. This prevents undefined behavior caused by left-shift overflow of the flag value in `GraphKit::new_array`. > > ### Testing > - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8356865) > - [x] tier1-3, plus some internal testing > - [x] Manual testing with values known to previously cause undefined behavior > > Thanks! This pull request has now been integrated. Changeset: c220b135 Author: Beno?t Maillard Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/c220b1358c91bce2eb7515e9f600004c7b975ee6 Stats: 64 lines in 3 files changed: 64 ins; 0 del; 0 mod 8356865: C2: Unreasonable values for debug flag FastAllocateSizeLimit can lead to left-shift-overflow, which is UB Reviewed-by: epeter, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25834 From qxing at openjdk.org Mon Jun 23 07:56:48 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 23 Jun 2025 07:56:48 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Narrow type bound ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/0d0bb579..1cb931b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=01-02 Stats: 76 lines in 1 file changed: 18 ins; 26 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From mchevalier at openjdk.org Mon Jun 23 07:59:33 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 23 Jun 2025 07:59:33 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: References: Message-ID: <99-WvIy2CRk9L3nYq_zIenWde1_cxvu3F-9ftYIrdh8=.d9ce6642-276a-40dc-a9ca-044d24160952@github.com> On Fri, 20 Jun 2025 15:53:59 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/divnode.cpp line 1522: >> >>> 1520: if (!can_reshape) { >>> 1521: return nullptr; >>> 1522: } >> >> Would this prevent us from doing the `make_tuple_of_input_state_and_top_return_values` trick? Because it seems to me that we do not need to reshape the node for that, right? Maybe you should reorder things for that? > > Also, you should probably call `CallLeafPureNode::Ideal` instead of duplicating its logic here and in other subclasses. In an earlier issue ([JDK-8349523](https://bugs.openjdk.org/browse/JDK-8349523)), I tried to remove these nodes during parsing. It didn't work well. The problem is that it's transformed by GVN before setting any output projection, so of course, the node is removing itself before having the opportunity of being used. We wait until usages have been set before we try to remove the node (or replace it with a tuple). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2160959373 From dzhang at openjdk.org Mon Jun 23 08:02:28 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 23 Jun 2025 08:02:28 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25924#issuecomment-2995354320 From duke at openjdk.org Mon Jun 23 08:02:29 2025 From: duke at openjdk.org (duke) Date: Mon, 23 Jun 2025 08:02:29 GMT Subject: RFR: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. @DingliZhang Your change (at version f2d474b558b559e5cb53b0fc79086a87a9dd2003) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25924#issuecomment-2995359778 From dzhang at openjdk.org Mon Jun 23 08:05:35 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 23 Jun 2025 08:05:35 GMT Subject: Integrated: 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed In-Reply-To: References: Message-ID: On Sat, 21 Jun 2025 14:45:13 GMT, Dingli Zhang wrote: > Hi all, > As described in the [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) issue, the case failed on linux-riscv64. We put the failed test > case into the Problem list until [JDK-8360168](https://bugs.openjdk.org/browse/JDK-8360168) is fixed. > > Please take a look and have some reviews. Thanks a lot. This pull request has now been integrated. Changeset: ad1033d6 Author: Dingli Zhang Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/ad1033d68f4dd030cad27f9868d4fa83b5080bcd Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8360169: Problem list CodeInvalidationReasonTest.java on linux-riscv64 until JDK-8360168 is fixed Reviewed-by: fyang, fjiang, syan, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/25924 From thartmann at openjdk.org Mon Jun 23 08:24:28 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Jun 2025 08:24:28 GMT Subject: [jdk25] RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 07:19:27 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [c11f36e6](https://github.com/openjdk/jdk/commit/c11f36e6200b6c39fd59530f28e9318c4153db49) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 20 Jun 2025 and was reviewed by Roberto Casta?eda Lozano and Emanuel Peter. > > Thanks! Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25929#pullrequestreview-2949127405 From shade at openjdk.org Mon Jun 23 08:32:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Jun 2025 08:32:32 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 21:54:45 GMT, Chad Rakoczy wrote: >> [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) >> >> The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) >> >> The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove >> >> Additional Testing: >> - [ ] Linux aarch64 fastdebug tier 1 >> - [ ] Linux aarch64 fastdebug tier 2 >> - [ ] Linux aarch64 fastdebug tier 3 >> - [ ] Linux aarch64 fastdebug tier 4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Remove r1 save Looks okay, as long as it still passes testing. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25906#pullrequestreview-2949153122 From tkurashige at openjdk.org Mon Jun 23 08:56:12 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Mon, 23 Jun 2025 08:56:12 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: References: Message-ID: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> > This PR is improvement of warning message when fail to load hsdis library. > > [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. > > However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." > > To clear up this confusion, I suggest printing a warning just before [MachCode]. > >
> > sample output > > If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: > > . > . > native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 > 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 > . > . > > > If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout > > $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version > OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output > > ============================= C1-compiled nmethod ============================== > ----------------------------------- Assembly ----------------------------------- > > Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) > total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 > . > . > > [Constant Pool (empty)] > > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > [Instructions begin] > 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b > . > . > [Constant Pool (empty)] > > > Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section > [MachCode] > [Verified Entry Point] > # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte > . > . > > >
> > Since the warning added in this fix cover the role of warning introduced in [JDK-8287001](https://bugs.openjdk.org/browse/JDK-828... Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: Fix message and revert lines for Xlog ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25726/files - new: https://git.openjdk.org/jdk/pull/25726/files/45584ba7..6ff4f9b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25726&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25726&range=00-01 Stats: 6 lines in 3 files changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25726.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25726/head:pull/25726 PR: https://git.openjdk.org/jdk/pull/25726 From shade at openjdk.org Mon Jun 23 09:09:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Jun 2025 09:09:30 GMT Subject: RFR: 8358572: C1 hits "need debug information" assert with -XX:-DeoptC1 In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 14:53:50 GMT, Manuel H?ssig wrote: > The debug flag `DeoptC1` is required to be true for dependency recording by an assert, but not all uses of dependency recording in C1 are guarded with `if (DeoptC1)`. Hence, running `java -XX:-DeoptC1 -version`fails at the aforementioned assert. > > This error has been present unconditionally in debug builds since dependency recording was enabled outside of JVMTI in [JDK-8324241]([https://bugs.openjdk.org/browse/JDK-8324241) and at least since JDK7 with JVMTI. Because this issue was discovered by searching for crashes of `java -version` plus some other flag, which indicates this flag has not been used in at least one year since every invocation with `-XX:-DeoptC1`crashes. Further, `DeoptC1` is only used for guarding dependency recording in three places. Thus, this PR removes the `DeoptC1` flag. > > This was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760179189) > - [x] tier1, tier2 plus Oracle internal testing on Oracle supported platforms Looks reasonable. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25900#pullrequestreview-2949279092 From epeter at openjdk.org Mon Jun 23 09:09:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 09:09:29 GMT Subject: [jdk25] RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 07:19:27 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [c11f36e6](https://github.com/openjdk/jdk/commit/c11f36e6200b6c39fd59530f28e9318c4153db49) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 20 Jun 2025 and was reviewed by Roberto Casta?eda Lozano and Emanuel Peter. > > Thanks! Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25929#pullrequestreview-2949277459 From epeter at openjdk.org Mon Jun 23 09:22:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 09:22:31 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: <99-WvIy2CRk9L3nYq_zIenWde1_cxvu3F-9ftYIrdh8=.d9ce6642-276a-40dc-a9ca-044d24160952@github.com> References: <99-WvIy2CRk9L3nYq_zIenWde1_cxvu3F-9ftYIrdh8=.d9ce6642-276a-40dc-a9ca-044d24160952@github.com> Message-ID: <9Oc_2z4n0h8cU825_70Vr6eMLLk6qz6hELGSeujGPsU=.a82e717f-e566-43fa-8da2-a609d4dc37a8@github.com> On Mon, 23 Jun 2025 07:56:25 GMT, Marc Chevalier wrote: >> Also, you should probably call `CallLeafPureNode::Ideal` instead of duplicating its logic here and in other subclasses. > > In an earlier issue ([JDK-8349523](https://bugs.openjdk.org/browse/JDK-8349523)), I tried to remove these nodes during parsing. It didn't work well. The problem is that it's transformed by GVN before setting any output projection, so of course, the node is removing itself before having the opportunity of being used. We wait until usages have been set before we try to remove the node (or replace it with a tuple). That sounds like a bug at the use-site... don't you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2161129339 From aph at openjdk.org Mon Jun 23 09:46:28 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 23 Jun 2025 09:46:28 GMT Subject: RFR: 8358572: C1 hits "need debug information" assert with -XX:-DeoptC1 In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 14:53:50 GMT, Manuel H?ssig wrote: > The debug flag `DeoptC1` is required to be true for dependency recording by an assert, but not all uses of dependency recording in C1 are guarded with `if (DeoptC1)`. Hence, running `java -XX:-DeoptC1 -version`fails at the aforementioned assert. > > This error has been present unconditionally in debug builds since dependency recording was enabled outside of JVMTI in [JDK-8324241]([https://bugs.openjdk.org/browse/JDK-8324241) and at least since JDK7 with JVMTI. Because this issue was discovered by searching for crashes of `java -version` plus some other flag, which indicates this flag has not been used in at least one year since every invocation with `-XX:-DeoptC1`crashes. Further, `DeoptC1` is only used for guarding dependency recording in three places. Thus, this PR removes the `DeoptC1` flag. > > This was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760179189) > - [x] tier1, tier2 plus Oracle internal testing on Oracle supported platforms Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25900#pullrequestreview-2949399226 From epeter at openjdk.org Mon Jun 23 10:43:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 10:43:29 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 07:56:48 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Narrow type bound src/hotspot/share/opto/countbitsnode.cpp line 46: > 44: static int count_trailing_zeros_long(jlong l) { > 45: return l == 0 ? BitsPerLong : count_trailing_zeros(l); > 46: } Can you explain why you need this? Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? src/hotspot/share/opto/countbitsnode.cpp line 53: > 51: if (t == Type::TOP) return Type::TOP; > 52: const TypeInt* ti = t->isa_int(); > 53: if (ti) { Implicit null check not allowed by style guide :) https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. test/hotspot/jtreg/compiler/c2/irTests/TestCountBitsRange.java line 2: > 1: /* > 2: * Copyright (c) 2025 Alibaba Group Holding Limited. All Rights Reserved. Can you please not use the `irTests` directory, and instead put it in a more thematically relevant one? Maybe `gvn`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2161282004 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2161283097 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2161285920 From epeter at openjdk.org Mon Jun 23 10:43:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 10:43:30 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 10:39:22 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Narrow type bound > > src/hotspot/share/opto/countbitsnode.cpp line 53: > >> 51: if (t == Type::TOP) return Type::TOP; >> 52: const TypeInt* ti = t->isa_int(); >> 53: if (ti) { > > Implicit null check not allowed by style guide :) > > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > >> Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. I know it was like that before, but when we touch code we should fix it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2161284566 From qamai at openjdk.org Mon Jun 23 10:57:27 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 23 Jun 2025 10:57:27 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: <2vGPKe7ESZqYjemMvDjFxb4QTk3VjybE0lk59Vqj1Ts=.e6a555a5-407b-4389-8db5-aa02a7de9960@github.com> On Mon, 23 Jun 2025 10:38:37 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Narrow type bound > > src/hotspot/share/opto/countbitsnode.cpp line 46: > >> 44: static int count_trailing_zeros_long(jlong l) { >> 45: return l == 0 ? BitsPerLong : count_trailing_zeros(l); >> 46: } > > Can you explain why you need this? > Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? This is because our implementation does not accept 0 as an input. I suggest doing this at `count_leading_zeros`, it makes more sense and also aligns our behaviour with the well-known [`countr_zero`](https://en.cppreference.com/w/cpp/numeric/countr_zero.html) and [`countl_zero`](https://en.cppreference.com/w/cpp/numeric/countl_zero.html) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2161319626 From tkurashige at openjdk.org Mon Jun 23 12:01:32 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Mon, 23 Jun 2025 12:01:32 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 09:30:42 GMT, Manuel H?ssig wrote: > Thank you for working on this. I have been confused by this myself and think this is a great improvement. I do have a few comments and questions, though. > > Currently, I do not understand exactly how your new message is only printed when hsdis is not loaded. Do we only emit a MachCode section if hsdis is not loaded? Thank you for your comment! Yes, my research shows that we only emit a MachCode section if hsdis is not loaded. `is_abstract()` in [src/hotspot/share/compiler/disassembler.hpp#L83](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/compiler/disassembler.hpp#L83C1-L88C4) determines if hsdis is not loaded. `is_abstract()` returns true if loading hsdis fails. static bool is_abstract() { if (!_tried_to_load_library) { load_library(); } return ! _library_usable; } There are three direct processes to output the MachCode section: 1. [src/hotspot/share/compiler/abstractDisassembler.cpp#L355](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/compiler/abstractDisassembler.cpp#L355) 2. [src/hotspot/share/code/nmethod.cpp#L3494](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/code/nmethod.cpp#L3494) 3. [src/hotspot/share/code/nmethod.cpp#L3528](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/code/nmethod.cpp#L3528) 1. is called when `is_abstract()` at [src/hotspot/share/compiler/disassembler.cpp#L889](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/compiler/disassembler.cpp#L889) returns true. For example, if loading hsdis fails to load and hs_err occurs 2. is called when `is_abstract()` at [src/hotspot/share/code/nmethod.cpp#L3444](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/code/nmethod.cpp#L3444) returns true and `compressed_with_comments` at [src/hotspot/share/code/nmethod.cpp#L3445](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/code/nmethod.cpp#L3445C14-L3445C38) is false. For example, if only `-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly` is specified 3. is called when `is_abstract()` at [src/hotspot/share/code/nmethod.cpp#L3444](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/code/nmethod.cpp#L3444) returns true and `compressed_with_comments` at [src/hotspot/share/code/nmethod.cpp#L3445](https://github.com/openjdk/jdk/blob/fe7ec312590ed9f70e6caad4ef454123138bbbcf/src/hotspot/share/code/nmethod.cpp#L3445C14-L3445C38) is true. For example, `-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:PrintAssemblyOptions=show-comment:off,show-block-comment:off` is specified > src/hotspot/share/compiler/disassembler.cpp line 841: > >> 839: os::dll_lookup(_library, decode_instructions_virtual_name)); >> 840: } >> 841: _tried_to_load_library = true; > > Personally, I would leave this warning. It does not hurt, and perhaps someone is depending on it to detect if hsdis is installed correctly. I thought your comment made sense. I fixed it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-2996199553 PR Review Comment: https://git.openjdk.org/jdk/pull/25726#discussion_r2161446105 From tkurashige at openjdk.org Mon Jun 23 12:04:29 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Mon, 23 Jun 2025 12:04:29 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 09:10:53 GMT, Manuel H?ssig wrote: > I don't think we should use [MachCode] You're right. I will stop using `[MachCode]`. > Here are a few suggestions I would personally prefer: Thank you for your suggestion. I prefer the former expression. However, if hsdis is loaded successfully, the disassembled code will appear in `[Disassembly]` instead of `[MachCode]`, so I don't think "unable to show disassembled code in MachCode section" is appropriate. Do you think "undisassembled" is clunky? "undisassembled" is also used in [src/hotspot/share/code/nmethod.cpp#L3428](https://github.com/openjdk/jdk/blob/c220b1358c91bce2eb7515e9f600004c7b975ee6/src/hotspot/share/code/nmethod.cpp#L3428), so I don't think it's too clunky. So, for example, how about `Loading hsdis library failed, undisassembled code is shown in MachCode section`? If "undisassembled" is clunky, I think it could be "not disassembled" for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25726#discussion_r2161451247 From mhaessig at openjdk.org Mon Jun 23 12:16:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 12:16:36 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: On Mon, 23 Jun 2025 08:56:12 GMT, Taizo Kurashige wrote: >> This PR is improvement of warning message when fail to load hsdis library. >> >> [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. >> >> However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." >> >> To clear up this confusion, I suggest printing a warning just before [MachCode]. >> >>
>> >> sample output >> >> If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: >> >> . >> . >> native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 >> 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 >> . >> . >> >> >> If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout >> >> $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version >> OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output >> >> ============================= C1-compiled nmethod ============================== >> ----------------------------------- Assembly ----------------------------------- >> >> Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) >> total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 >> . >> . >> >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Instructions begin] >> 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b >> . >> . >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Verified Entry Point] >> # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte >> . >> . >> >> >>
>> >> Since... > > Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: > > Fix message and revert lines for Xlog Thank you for addressing my comments and for your explanation! The changes look good to me, but nonetheless I kicked off some testing on our side and I'll get back to you with the results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-2996260052 From mhaessig at openjdk.org Mon Jun 23 12:16:37 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 12:16:37 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 12:02:09 GMT, Taizo Kurashige wrote: >> src/hotspot/share/code/nmethod.cpp line 3493: >> >>> 3491: st->bol(); >>> 3492: st->cr(); >>> 3493: st->print_cr("Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section"); >> >> Some comments on the message: >> - I don't think we should use `[MachCode]` inside the square brackets apart from marking the start of a section. Otherwise, tools parsing a hs_err file incorrectly identify the start of the MachCode section. >> - Personally, I find the language of the message a bit clunky. Here are a few suggestions I would personally prefer: >> `Loading hsdis library failed, unable to show disassembled code in MachCode section` or `Note: Unable to display disassembled code because loading of hsdis library failed.`. > >> I don't think we should use [MachCode] > > You're right. I will stop using `[MachCode]`. > >> Here are a few suggestions I would personally prefer: > > Thank you for your suggestion. > I prefer the former expression. > However, if hsdis is loaded successfully, the disassembled code will appear in `[Disassembly]` instead of `[MachCode]`, so I don't think "unable to show disassembled code in MachCode section" is appropriate. > Do you think "undisassembled" is clunky? "undisassembled" is also used in [src/hotspot/share/code/nmethod.cpp#L3428](https://github.com/openjdk/jdk/blob/c220b1358c91bce2eb7515e9f600004c7b975ee6/src/hotspot/share/code/nmethod.cpp#L3428), so I don't think it's too clunky. > So, for example, how about `Loading hsdis library failed, undisassembled code is shown in MachCode section`? If "undisassembled" is clunky, I think it could be "not disassembled" for example. I do happen to dislike "undisassembled", but I was not aware of its usage in other places in the codebase. I'm fine with the message now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25726#discussion_r2161473437 From mchevalier at openjdk.org Mon Jun 23 12:39:23 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 23 Jun 2025 12:39:23 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v3] In-Reply-To: References: Message-ID: > A first part toward a better support of pure functions, but this time, with guidance from @iwanowww. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are later expanded into regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls, inheriting leaf calls, that are expanded into regular leaf calls during final graph reshaping. The possibility to support pure call directly in AD file is left open. > > This PR also introduces `TupleNode` (largely based on an original idea/implem of @iwanowww), that just tie multiple input together and play well with `ProjNode`: the n-th projection of a `TupleNode` is the n-th input of the tuple. This is a convenient way to skip and remove nodes from the graph while delegating the difficulty of the surgery to the trusted IGVN's implementation. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: mostly comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25760/files - new: https://git.openjdk.org/jdk/pull/25760/files/34fd5e9a..7028b561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25760&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25760&range=01-02 Stats: 35 lines in 3 files changed: 19 ins; 8 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25760/head:pull/25760 PR: https://git.openjdk.org/jdk/pull/25760 From mchevalier at openjdk.org Mon Jun 23 12:41:30 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 23 Jun 2025 12:41:30 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls [v2] In-Reply-To: <9Oc_2z4n0h8cU825_70Vr6eMLLk6qz6hELGSeujGPsU=.a82e717f-e566-43fa-8da2-a609d4dc37a8@github.com> References: <99-WvIy2CRk9L3nYq_zIenWde1_cxvu3F-9ftYIrdh8=.d9ce6642-276a-40dc-a9ca-044d24160952@github.com> <9Oc_2z4n0h8cU825_70Vr6eMLLk6qz6hELGSeujGPsU=.a82e717f-e566-43fa-8da2-a609d4dc37a8@github.com> Message-ID: On Mon, 23 Jun 2025 09:20:13 GMT, Emanuel Peter wrote: >> In an earlier issue ([JDK-8349523](https://bugs.openjdk.org/browse/JDK-8349523)), I tried to remove these nodes during parsing. It didn't work well. The problem is that it's transformed by GVN before setting any output projection, so of course, the node is removing itself before having the opportunity of being used. We wait until usages have been set before we try to remove the node (or replace it with a tuple). > > That sounds like a bug at the use-site... don't you think? As discussed offline, it's relatively normal. Adding a comment to explain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25760#discussion_r2161524761 From dlunden at openjdk.org Mon Jun 23 13:28:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 13:28:49 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: <1jSWqZYtxeIKxxTi161rvg46AGTvUusIb2FVnYtZa-U=.d9aadfa7-8869-428e-aebe-e708d157883c@github.com> On Fri, 20 Jun 2025 10:14:09 GMT, Emanuel Peter wrote: >> I assume you mean resource allocate in the default `Thread::current()->resource_area()`? The existing resource marks are often far away, leading to unnecessary memory consumption. Trying to narrow the existing resource marks does lead to conflicts. I added a motivation to the description in `compile.hpp`! > > And these conflicts cannot be resolved? Can you bring an example that is too much effort? One example is the `ResourceMark` at the top of `PhaseCFG::sched_call` (added in this changeset) which conflicts with the `ResourceMark` at the top of `PhaseCFG::global_code_motion`. Specifically, the conflict is at the `VectorSet` named `visited` in `PhaseCFG::global_code_motion` and named `next_call` in `PhaseCFG::sched_call`. I don't see a trivial resolution, but I'm open to suggestions. There are loops in between the resource marks, so memory consumption will potentially increase significantly if we simply remove the inner resource mark. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2161635554 From dlunden at openjdk.org Mon Jun 23 13:37:27 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 13:37:27 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v21] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Add deep copy comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/9cefa15f..7966937f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=19-20 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Mon Jun 23 13:37:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 13:37:28 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 10:21:28 GMT, Emanuel Peter wrote: >> Yes, it is the copy constructor. Can you elaborate a bit on what type of comment you expect? There are already some comments in the `copy` method. > > I think I was wondering if it was shallow or deep copying, but it has been a while since I reviewed. Right, it's a deep copy (both for copy construction and copy assignment). I added comments now, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2161654006 From dlunden at openjdk.org Mon Jun 23 14:00:25 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 14:00:25 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v22] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Fix implicit zero and nullptr checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/7966937f..de5c82db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=20-21 Stats: 11 lines in 3 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Mon Jun 23 14:00:26 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 14:00:26 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: Message-ID: <2Y_oe67rJ_p2a2zz-53UH4WV-OyVT581gX9tfIx3jkk=.8b93712d-ab83-479a-8175-04922ffbdf9f@github.com> On Fri, 20 Jun 2025 10:23:57 GMT, Emanuel Peter wrote: >> This is a bit tricky. It is indeed a pointer type (`uintptr_t`), but it is not used as a pointer. It is used to store the register mask bits. So I guess this is then not an implicit null check? It is an implicit zero check :slightly_smiling_face: > >> Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. > > Ok, well no matter what, it would not conform to the style. Maybe you already fixed it though, not sure. Ah, good to know. I fixed all the occurrences I could find (and some bonus ones). I guess there is no static checker available to automate this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2161704150 From epeter at openjdk.org Mon Jun 23 14:17:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 23 Jun 2025 14:17:44 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: <2Y_oe67rJ_p2a2zz-53UH4WV-OyVT581gX9tfIx3jkk=.8b93712d-ab83-479a-8175-04922ffbdf9f@github.com> References: <2Y_oe67rJ_p2a2zz-53UH4WV-OyVT581gX9tfIx3jkk=.8b93712d-ab83-479a-8175-04922ffbdf9f@github.com> Message-ID: On Mon, 23 Jun 2025 13:56:54 GMT, Daniel Lund?n wrote: >>> Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. >> >> Ok, well no matter what, it would not conform to the style. Maybe you already fixed it though, not sure. > > Ah, good to know. I fixed all the occurrences I could find (and some bonus ones). I guess there is no static checker available to automate this? Not that I am aware of, no :/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2161745044 From dlunden at openjdk.org Mon Jun 23 14:31:24 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 14:31:24 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v23] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Add clarifying comments at definitions of register mask sizes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/de5c82db..78259023 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=21-22 Stats: 18 lines in 1 file changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Mon Jun 23 14:31:25 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 14:31:25 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: <4JpU1sh_7wBfEZG3sJ8z-dWz-Wpk7osUjYZByvqetgc=.acf93145-620b-42e0-af57-ddf20875fd96@github.com> References: <4JpU1sh_7wBfEZG3sJ8z-dWz-Wpk7osUjYZByvqetgc=.acf93145-620b-42e0-af57-ddf20875fd96@github.com> Message-ID: On Fri, 20 Jun 2025 10:27:41 GMT, Emanuel Peter wrote: >> Yes, it is confusing but consistent. Your intuition is correct: there is a difference between `_rm_size` (the current total size, including extension) and `_RM_SIZE` (the base static size) ?. @robcasloz introduced the "basic" terminology when working on tests in `test_regmask.cpp` and needed some way to expose `_RM_SIZE` publically in non-product code. Therefore, we have the method `basic_rm_size`. I don't really have a better suggestion. Perhaps `base_rm_size`, or `static_rm_size`? As in "the base/static part of rm_size". We cannot call the method `_RM_SIZE()` as that is prohibited by the style guide. We cannot call the method `RM_SIZE()` as `RM_SIZE` is a macro (and also not the same thing as `_RM_SIZE` on 64-bit machines). >> >>> Why do we even need the RM / rm prefix everyhwere? >> >> We really don't, but that's how it is :slightly_smiling_face: Could be worth refactoring, but not in this changeset! > > Alright. Well sure, we don't have to do a full renaming now. Though I do need to understand what is what to be able to review. Is there a good definition somewhere of what is what? I added comments at definition points of the various sizes. Let me know if something is still confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2161773220 From dlunden at openjdk.org Mon Jun 23 14:35:45 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 23 Jun 2025 14:35:45 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v17] In-Reply-To: References: <4YDlcsxJy-4NmTAijvFrfzEbyE7oNKDYqG2cuXP2xmI=.92aca6da-3b5d-4692-8aa6-9cfbbfcc7a41@github.com> <_oBt35lcOuEjkT2Q1z_gDxG739kPVKJhXztAJAn3Ysw=.87a04c5b-5a1f-4042-a0cd-1447f31827ce@github.com> Message-ID: On Fri, 20 Jun 2025 10:00:14 GMT, Emanuel Peter wrote: >> Note that I only added the `-XX:MaxNodeLimit=20000`, all the other flags are from before (I just added line breaks). It could very well make sense to have a run with fewer flags, but I'm not sure if that's compatible with what the original author intended. I'd prefer leaving it as it is. > > Is it the use of the JSR292 methods that increases the limit? Or do you need to increase the limit to make sure we don't hit the limit? Yes, the use of JSR292 methods increases the `MaxNodeLimit`: // Bump max node limit for JSR292 users if (bc() == Bytecodes::_invokedynamic || orig_callee->is_method_handle_intrinsic()) { C->set_max_node_limit(3*MaxNodeLimit); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2161783711 From mhaessig at openjdk.org Mon Jun 23 15:19:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 23 Jun 2025 15:19:29 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: On Mon, 23 Jun 2025 08:56:12 GMT, Taizo Kurashige wrote: >> This PR is improvement of warning message when fail to load hsdis library. >> >> [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. >> >> However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." >> >> To clear up this confusion, I suggest printing a warning just before [MachCode]. >> >>
>> >> sample output >> >> If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: >> >> . >> . >> native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 >> 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 >> . >> . >> >> >> If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout >> >> $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version >> OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output >> >> ============================= C1-compiled nmethod ============================== >> ----------------------------------- Assembly ----------------------------------- >> >> Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) >> total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 >> . >> . >> >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Instructions begin] >> 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b >> . >> . >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Verified Entry Point] >> # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte >> . >> . >> >> >>
>> >> Since... > > Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: > > Fix message and revert lines for Xlog Testing passed. Looks good to me! ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25726#pullrequestreview-2950519193 From mcimadamore at openjdk.org Mon Jun 23 16:25:40 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 23 Jun 2025 16:25:40 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v34] In-Reply-To: References: Message-ID: On Thu, 5 Jun 2025 08:27:47 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 94 commits: > > - small fix > - Merge branch 'master' into JDK-8342692 > - review > - review > - Update test/micro/org/openjdk/bench/java/lang/foreign/HeapMismatchManualLoopTest.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopScaleOverflow.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopPredicatesClone.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoop.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 84 more: https://git.openjdk.org/jdk/compare/faf19abd...fd19ee84 I did some more tests with and without this patch some off-heap memory segment loops. The benchmark I used can be found here: https://github.com/mcimadamore/jdk/blob/fmaUnsafeBench/test/micro/org/openjdk/bench/java/lang/foreign/OffHeapAccessLoop.java These are the results with vanilla JDK Benchmark (elems) Mode Cnt Score Error Units OffHeapAccessLoop.segmentReadLoop 1 avgt 30 2.167 ? 0.078 ns/op OffHeapAccessLoop.segmentReadLoop 10 avgt 30 6.860 ? 0.098 ns/op OffHeapAccessLoop.segmentReadLoop 100 avgt 30 17.223 ? 0.287 ns/op OffHeapAccessLoop.segmentReadLoop 1000 avgt 30 118.933 ? 1.828 ns/op OffHeapAccessLoop.segmentWriteLoop 1 avgt 30 2.089 ? 0.030 ns/op OffHeapAccessLoop.segmentWriteLoop 10 avgt 30 8.125 ? 0.035 ns/op OffHeapAccessLoop.segmentWriteLoop 100 avgt 30 11.494 ? 0.781 ns/op OffHeapAccessLoop.segmentWriteLoop 1000 avgt 30 33.904 ? 0.327 ns/op OffHeapAccessLoop.unsafeReadLoop 1 avgt 30 1.401 ? 0.030 ns/op OffHeapAccessLoop.unsafeReadLoop 10 avgt 30 3.051 ? 0.048 ns/op OffHeapAccessLoop.unsafeReadLoop 100 avgt 30 12.972 ? 0.213 ns/op OffHeapAccessLoop.unsafeReadLoop 1000 avgt 30 114.150 ? 1.868 ns/op OffHeapAccessLoop.unsafeWriteLoop 1 avgt 30 1.400 ? 0.017 ns/op OffHeapAccessLoop.unsafeWriteLoop 10 avgt 30 2.849 ? 0.038 ns/op OffHeapAccessLoop.unsafeWriteLoop 100 avgt 30 14.591 ? 0.179 ns/op OffHeapAccessLoop.unsafeWriteLoop 1000 avgt 30 147.612 ? 2.418 ns/op (Note: segment/write is significantly faster, as it uses vectorization -- in all other cases vectorization fails, so the numbers are not always comparable with each other). This is what I get with this PR: Benchmark (elems) Mode Cnt Score Error Units OffHeapAccessLoop.segmentReadLoop 1 avgt 30 2.129 ? 0.048 ns/op OffHeapAccessLoop.segmentReadLoop 10 avgt 30 5.051 ? 0.078 ns/op OffHeapAccessLoop.segmentReadLoop 100 avgt 30 15.119 ? 0.110 ns/op OffHeapAccessLoop.segmentReadLoop 1000 avgt 30 115.040 ? 0.685 ns/op OffHeapAccessLoop.segmentWriteLoop 1 avgt 30 2.143 ? 0.013 ns/op OffHeapAccessLoop.segmentWriteLoop 10 avgt 30 6.290 ? 0.017 ns/op OffHeapAccessLoop.segmentWriteLoop 100 avgt 30 9.766 ? 0.072 ns/op OffHeapAccessLoop.segmentWriteLoop 1000 avgt 30 32.294 ? 0.078 ns/op OffHeapAccessLoop.unsafeReadLoop 1 avgt 30 1.403 ? 0.031 ns/op OffHeapAccessLoop.unsafeReadLoop 10 avgt 30 3.058 ? 0.067 ns/op OffHeapAccessLoop.unsafeReadLoop 100 avgt 30 12.617 ? 0.343 ns/op OffHeapAccessLoop.unsafeReadLoop 1000 avgt 30 112.505 ? 0.581 ns/op OffHeapAccessLoop.unsafeWriteLoop 1 avgt 30 1.392 ? 0.017 ns/op OffHeapAccessLoop.unsafeWriteLoop 10 avgt 30 2.835 ? 0.053 ns/op OffHeapAccessLoop.unsafeWriteLoop 100 avgt 30 14.738 ? 0.278 ns/op OffHeapAccessLoop.unsafeWriteLoop 1000 avgt 30 145.214 ? 2.302 ns/op The results are very positive, note how the overhead for small iteration count (10/100) is lower than before, which is very good, as this is an area where memory segment access was struggling a bit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2997112769 From duke at openjdk.org Mon Jun 23 16:47:28 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 23 Jun 2025 16:47:28 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: <3-xpN0rH7hwF8i0vJSNv6TB8y0ZQyJWfEBIhIB-HM7s=.8cd15375-2127-4e25-90cd-47d4d4644c43@github.com> On Mon, 23 Jun 2025 08:29:58 GMT, Aleksey Shipilev wrote: > Looks okay, as long as it still passes testing. Linux jtreg tests I ran passed. Windows GHA actions is failing but seems unrelated ------------- PR Comment: https://git.openjdk.org/jdk/pull/25906#issuecomment-2997169686 From shade at openjdk.org Mon Jun 23 16:52:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Jun 2025 16:52:29 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 21:54:45 GMT, Chad Rakoczy wrote: >> [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) >> >> The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) >> >> The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove >> >> Additional Testing: >> - [x] Linux aarch64 fastdebug tier 1 >> - [x] Linux aarch64 fastdebug tier 2 >> - [x] Linux aarch64 fastdebug tier 3 >> - [x] Linux aarch64 fastdebug tier 4 > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Remove r1 save Windows GHA is fixed by https://github.com/openjdk/jdk/commit/72679c94ee00c87b9b51233938e5ffa97ef825b1, you can pull from master, and it should fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25906#issuecomment-2997186652 From duke at openjdk.org Mon Jun 23 16:58:02 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 23 Jun 2025 16:58:02 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: References: Message-ID: <-tSgNK6pgypChOHjH5bn0fpOZB3IaAdU2P_ISNqGcTo=.f635d89d-5df9-494d-91a7-23ed67cbdd7c@github.com> > [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) > > The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) > > The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove > > Additional Testing: > - [x] Linux aarch64 fastdebug tier 1 > - [x] Linux aarch64 fastdebug tier 2 > - [x] Linux aarch64 fastdebug tier 3 > - [x] Linux aarch64 fastdebug tier 4 Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8358655-profile_taken_branch - Remove r1 save - 8358655: AArch64: Simplify Interpreter::profile_taken_branch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25906/files - new: https://git.openjdk.org/jdk/pull/25906/files/c2d5ef37..29cd0043 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25906&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25906&range=01-02 Stats: 7765 lines in 302 files changed: 2952 ins; 2951 del; 1862 mod Patch: https://git.openjdk.org/jdk/pull/25906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25906/head:pull/25906 PR: https://git.openjdk.org/jdk/pull/25906 From duke at openjdk.org Mon Jun 23 19:42:35 2025 From: duke at openjdk.org (duke) Date: Mon, 23 Jun 2025 19:42:35 GMT Subject: RFR: 8358655: AArch64: Simplify Interpreter::profile_taken_branch [v3] In-Reply-To: <-tSgNK6pgypChOHjH5bn0fpOZB3IaAdU2P_ISNqGcTo=.f635d89d-5df9-494d-91a7-23ed67cbdd7c@github.com> References: <-tSgNK6pgypChOHjH5bn0fpOZB3IaAdU2P_ISNqGcTo=.f635d89d-5df9-494d-91a7-23ed67cbdd7c@github.com> Message-ID: On Mon, 23 Jun 2025 16:58:02 GMT, Chad Rakoczy wrote: >> [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) >> >> The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) >> >> The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove >> >> Additional Testing: >> - [x] Linux aarch64 fastdebug tier 1 >> - [x] Linux aarch64 fastdebug tier 2 >> - [x] Linux aarch64 fastdebug tier 3 >> - [x] Linux aarch64 fastdebug tier 4 > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8358655-profile_taken_branch > - Remove r1 save > - 8358655: AArch64: Simplify Interpreter::profile_taken_branch @chadrako Your change (at version 29cd00438e6ecb1b90c60f6b016281e1c29354ef) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25906#issuecomment-2997732228 From liach at openjdk.org Mon Jun 23 23:13:31 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 23 Jun 2025 23:13:31 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 07:56:48 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Narrow type bound src/hotspot/share/opto/countbitsnode.cpp line 61: > 59: ti->_widen); > 60: } > 61: return TypeInt::INT; Just curious, when would this fallback path be used? test/hotspot/jtreg/compiler/c2/irTests/TestCountBitsRange.java line 47: > 45: @IR(failOn = IRNode.COUNT_LEADING_ZEROS_I) > 46: public boolean clzCompareInt() { > 47: return Integer.numberOfLeadingZeros(i) < 0 || Integer.numberOfLeadingZeros(i) > 32; You limited the type to check the ones/zeroes of the input value, `count_leading_zeros_int(~ti->_bits._zeros), count_leading_zeros_int(ti->_bits._ones),`. I think we need test cases to cover those too, given they are more complex and thus more error prone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2162669197 PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2162666011 From dlong at openjdk.org Mon Jun 23 23:24:45 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 23 Jun 2025 23:24:45 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v31] In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 22:06:31 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - Move far branch fix to fix_relocation_after_move > - Add test to verify JVMTI events during nmethod relocation > - ... and 78 more: https://git.openjdk.org/jdk/compare/0dd50dbb...e51a1a09 The extra work that CallRelocation::fix_relocation_after_move and Relocation::pd_set_call_destination are now doing is only needed for nmethod relocation, right? We could avoid doing extra work for the old CodeBuffer::relocate_code_to() path by adding a flag to CallRelocation::fix_relocation_after_move() (or creating a new function) and then pass that flag on to pd_set_call_destination. In particular, it would be nice to avoid unnecessary calls to CodeCache::find_blob(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2998261601 From duke at openjdk.org Tue Jun 24 02:37:43 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 24 Jun 2025 02:37:43 GMT Subject: Integrated: 8358655: AArch64: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 21:17:30 GMT, Chad Rakoczy wrote: > [JDK-8358655](https://bugs.openjdk.org/browse/JDK-8358655) > > The aarch64 version of [JDK-8357434](https://bugs.openjdk.org/browse/JDK-8357434) > > The counter is 64-bit, never practically overflows, and no other code cares about it so it is safe to remove > > Additional Testing: > - [x] Linux aarch64 fastdebug tier 1 > - [x] Linux aarch64 fastdebug tier 2 > - [x] Linux aarch64 fastdebug tier 3 > - [x] Linux aarch64 fastdebug tier 4 This pull request has now been integrated. Changeset: a350a111 Author: Chad Rakoczy Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/a350a1115a32ae1aa013a22c05a009051a674793 Stats: 24 lines in 3 files changed: 0 ins; 18 del; 6 mod 8358655: AArch64: Simplify Interpreter::profile_taken_branch Reviewed-by: shade, aph ------------- PR: https://git.openjdk.org/jdk/pull/25906 From amitkumar at openjdk.org Tue Jun 24 04:01:27 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 24 Jun 2025 04:01:27 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v8] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into codecache - remove string check & update copyright header - updates testcase - remove whitespace - fix comment - add test case - fix - move the changes in flag constraints specific file - make jvm exit gracefully ------------- Changes: https://git.openjdk.org/jdk/pull/25708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=07 Stats: 85 lines in 2 files changed: 84 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From amitkumar at openjdk.org Tue Jun 24 04:03:42 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 24 Jun 2025 04:03:42 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v9] In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: uintx to size_t ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25708/files - new: https://git.openjdk.org/jdk/pull/25708/files/41cc96ed..a8437a4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25708&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25708/head:pull/25708 PR: https://git.openjdk.org/jdk/pull/25708 From mhaessig at openjdk.org Tue Jun 24 06:57:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 24 Jun 2025 06:57:32 GMT Subject: Integrated: 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags In-Reply-To: References: Message-ID: On Thu, 19 Jun 2025 15:15:16 GMT, Manuel H?ssig wrote: > A run of `runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java` with an ubsan enabled binary revealed that passing the value 0 to `Tier(3|4)LoadFeedback`, and `TieredRateUpdateMinTime` lead to division by zero. > > Since `Tier(3|4)LoadFeedback` should disable the scaling of the compilation thresholds, 8bf37ee special cases the 0 case to disable scaling and documents it accordingly. > > 4893b28 sets the lower limit for `TieredRateUpdate(Min|Max)Time` to 1 since the code assumes that at least 1ms passes between each event: > > https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/compiler/compilationPolicy.cpp#L968-L974 > > This PR was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760915006) > - [x] tier1 and 2 plus Oracle internal testing on Oracle supported platforms This pull request has now been integrated. Changeset: f6ff38ab Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/f6ff38ab4292762a35fb151b6886e58df60824d5 Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod 8353815: [ubsan] compilationPolicy.cpp: division by zero related to tiered compilation flags Reviewed-by: mbaesken, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25902 From mhaessig at openjdk.org Tue Jun 24 07:03:32 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 24 Jun 2025 07:03:32 GMT Subject: RFR: 8358572: C1 hits "need debug information" assert with -XX:-DeoptC1 In-Reply-To: References: Message-ID: <7apV6i0H5OSpib_-laMaBToarnQtaJv2egG7DitOX8U=.5e60985f-21f1-471b-b8e5-6af1486b7381@github.com> On Thu, 19 Jun 2025 14:53:50 GMT, Manuel H?ssig wrote: > The debug flag `DeoptC1` is required to be true for dependency recording by an assert, but not all uses of dependency recording in C1 are guarded with `if (DeoptC1)`. Hence, running `java -XX:-DeoptC1 -version`fails at the aforementioned assert. > > This error has been present unconditionally in debug builds since dependency recording was enabled outside of JVMTI in [JDK-8324241]([https://bugs.openjdk.org/browse/JDK-8324241) and at least since JDK7 with JVMTI. Because this issue was discovered by searching for crashes of `java -version` plus some other flag, which indicates this flag has not been used in at least one year since every invocation with `-XX:-DeoptC1`crashes. Further, `DeoptC1` is only used for guarding dependency recording in three places. Thus, this PR removes the `DeoptC1` flag. > > This was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760179189) > - [x] tier1, tier2 plus Oracle internal testing on Oracle supported platforms Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25900#issuecomment-2999067741 From mhaessig at openjdk.org Tue Jun 24 07:03:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 24 Jun 2025 07:03:33 GMT Subject: Integrated: 8358572: C1 hits "need debug information" assert with -XX:-DeoptC1 In-Reply-To: References: Message-ID: <55dUcmHmHGxxWgvkv9n9dkRHoaLyf77T12WXB72c37o=.55e7e5d0-493a-4ff7-8b71-b484cbd6d1eb@github.com> On Thu, 19 Jun 2025 14:53:50 GMT, Manuel H?ssig wrote: > The debug flag `DeoptC1` is required to be true for dependency recording by an assert, but not all uses of dependency recording in C1 are guarded with `if (DeoptC1)`. Hence, running `java -XX:-DeoptC1 -version`fails at the aforementioned assert. > > This error has been present unconditionally in debug builds since dependency recording was enabled outside of JVMTI in [JDK-8324241]([https://bugs.openjdk.org/browse/JDK-8324241) and at least since JDK7 with JVMTI. Because this issue was discovered by searching for crashes of `java -version` plus some other flag, which indicates this flag has not been used in at least one year since every invocation with `-XX:-DeoptC1`crashes. Further, `DeoptC1` is only used for guarding dependency recording in three places. Thus, this PR removes the `DeoptC1` flag. > > This was tested with: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15760179189) > - [x] tier1, tier2 plus Oracle internal testing on Oracle supported platforms This pull request has now been integrated. Changeset: 03d66d9e Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/03d66d9ee239d77d54912f4fa3074560ac2a8101 Stats: 7 lines in 3 files changed: 0 ins; 4 del; 3 mod 8358572: C1 hits "need debug information" assert with -XX:-DeoptC1 Reviewed-by: shade, aph ------------- PR: https://git.openjdk.org/jdk/pull/25900 From qxing at openjdk.org Tue Jun 24 07:37:30 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 24 Jun 2025 07:37:30 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: <2vGPKe7ESZqYjemMvDjFxb4QTk3VjybE0lk59Vqj1Ts=.e6a555a5-407b-4389-8db5-aa02a7de9960@github.com> References: <2vGPKe7ESZqYjemMvDjFxb4QTk3VjybE0lk59Vqj1Ts=.e6a555a5-407b-4389-8db5-aa02a7de9960@github.com> Message-ID: On Mon, 23 Jun 2025 10:54:57 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/countbitsnode.cpp line 46: >> >>> 44: static int count_trailing_zeros_long(jlong l) { >>> 45: return l == 0 ? BitsPerLong : count_trailing_zeros(l); >>> 46: } >> >> Can you explain why you need this? >> Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? > > This is because our implementation does not accept 0 as an input. I suggest doing this at `count_leading_zeros`, it makes more sense and also aligns our behaviour with the well-known [`countr_zero`](https://en.cppreference.com/w/cpp/numeric/countr_zero.html) and [`countl_zero`](https://en.cppreference.com/w/cpp/numeric/countl_zero.html) > Can you explain why you need this? Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? @eme64 The explanation of @merykitty is right, the implementation of `count_leading_zeros` and `count_trailing_zeros` reject zero as the input. Perhaps we could open another PR to add zero support for these functions, since it's less relevant to this node type change and might require other changes to the code that calls them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2163180590 From mchevalier at openjdk.org Tue Jun 24 07:38:21 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 24 Jun 2025 07:38:21 GMT Subject: RFR: 8359344: C2: Malformed control flow after intrinsic bailout Message-ID: When intrinsic bailout, we assume that the control in the `LibraryCallKit` did not change: https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/opto/library_call.cpp#L137 This is enforced by restoring the old state, like in https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/opto/library_call.cpp#L1722-L1732 That is good, but not sufficient. First, the most obvious, one could have already built some structure without moving the control. For instance, we can obtain something such as: ![1 after-intrinsic-bailout-during-late-inlining](https://github.com/user-attachments/assets/2fd255cc-0bfc-4841-8dd1-f64d502e0ee1) Here, during late inlining, the call `323` is candidate to be inline, but that bails out. Yet, a call to `make_unsafe_address` was made, which built nodes `354 If` and everything under. This is needed as tests are made on the resulting nodes (especially `366 AddP`) to know whether we should bail out or not. At the end, we get 2 control successor to `346 IfFalse`: the call that is not removed and the leftover of the intrinsic that will be cleanup much later, but not by RemoveUseless. Another situation is somewhat worse, when happening during parsing. It can lead to such cases: ![2 after-intrinsic-bailout-during-parsing](https://github.com/user-attachments/assets/4524c615-6521-4f0d-8f61-c426f9179035) The nodes `31 OpaqueNotNull`, `31 If`, `36 IfTrue`, `33 IfFalse`, `35 Halt`, `44 If`, `45 IfTrue`, `46 IfFalse` are leftover from a bailing out intrinsic. The replacement call `49 CallStaticJava` should come just under `5 Parm`, but the control was updated and the call is actually built under `36 If`. Then, why does the previous assert doesn't complain? This is because there is more than one control, or one map. In intrinsics that need to restore their state, the initial `SafePoint` map is cloned, the clone is kept aside, and if needed (bailing out), we set the current map to this saved clone. But there is another map from which the one of the `LibraryCallKit` comes, and that survives longer, it's the one that is contained in the `JVMState`: https://github.com/openjdk/jdk/blob/c4fb00a7be51c7a05a29d3d57d787feb5c698ddf/src/hotspot/share/opto/library_call.cpp#L101-L102 And here there is the challenge: - the `JVMState jvms` contains a `SafePoint` map, this map must have `jvms` as `jvms` (pointer comparison) - we can't really change the pointer, just the content - after bailing out, we need the map of the `jvms` to be where it was, so that the graph construction can continue where it was. - if the intrinsic tried building some control flow, but we don't need it, we should remove it. So... let's do that! When a intrinsic bails out and regret its choice, we need to have remembered the old `JVMState`, set the map of it correctly, set the `jvms` of the map of it correctly, restore the map and sp of the `LibraryCallKit` as it was done before. On top of that, we remember control nodes that existed under our `control()` before trying to intrinsify: new control nodes that is not the (new) current map(= the clone of the map before) are disconnected to leave a nice CFG. This has 2 interesting consequences: - in the case of `compiler/intrinsics/VectorIntoArrayInvalidControlFlow.java`, compilation used to bailout because of malformed CFG on Aarch64. I'm adding a test that reproduced on x64 and Aarch64 and make the compilation bailout into a crash. The graph is now correctly shape on intrinsic bailout. - in the case of `compiler/unsafe/OpaqueAccesses.java`, the whole (useless) structure introduced by the bailing out intrinsic is now removed. The call is now connected to where the intrinsic started, not where it ended (as shown under). I've adapted this test to check we don't have both a call and intrinsic leftover. Some of these cases are being intrinsiced, some are left as a call, and I don't want to be too strict about which must be which, as long as they are not both at the same time. ![3 after-intrinsic-bailout-during-parsing](https://github.com/user-attachments/assets/26652b49-4875-4953-aea5-6fa549f06822) Thanks, Marc ------------- Commit messages: - whoops Forgot to remove a bit, and restore sp - Urgh - Adapt test - Re-try - Fix test - Trying something Changes: https://git.openjdk.org/jdk/pull/25936/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25936&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359344 Stats: 373 lines in 7 files changed: 277 ins; 50 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/25936.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25936/head:pull/25936 PR: https://git.openjdk.org/jdk/pull/25936 From duke at openjdk.org Tue Jun 24 08:12:34 2025 From: duke at openjdk.org (David Beaumont) Date: Tue, 24 Jun 2025 08:12:34 GMT Subject: Integrated: 8360131: Remove use of soon-to-be-removed APIs by CTW framework In-Reply-To: References: Message-ID: On Fri, 20 Jun 2025 14:33:49 GMT, David Beaumont wrote: > Migrate the CWT framework to use only supported JRT file system access for fetching class bytes. > This avoids accessing APIs in ImageReader which are scheduled to be removed as part of preview mode class support in Valhalla (essentially these APIs are "too low level" and expose semantics that are incompatible with supporting preview classes in Valhalla). > > This will be a further change to this code when the preview mode work goes in, but this will be limited to how the file system is opened (with or without preview mode). This pull request has now been integrated. Changeset: fdfc5578 Author: David Beaumont Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/fdfc557878a7a2ec984002f38b871da5eec71217 Stats: 57 lines in 2 files changed: 29 ins; 4 del; 24 mod 8360131: Remove use of soon-to-be-removed APIs by CTW framework Reviewed-by: liach, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25916 From epeter at openjdk.org Tue Jun 24 08:43:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Jun 2025 08:43:38 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> Message-ID: On Thu, 12 Jun 2025 15:40:49 GMT, Roland Westrelin wrote: >> @rwestrel Let me know if you want us to run some extra testing. Christian said that you might be planning to wait until the JDK26 fork, and merge then, and then we can run testing. Up to you :) > > @eme64 in case you forgot about that one, it's ready for another round of reviews. @rwestrel I think I missed your message. This will be one of the next things I'll review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2999372053 From tschatzl at openjdk.org Tue Jun 24 08:57:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 24 Jun 2025 08:57:31 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... > Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently. > I have a question regarding the existing code/logic. > > ``` > // In case the GC is concurrent, we make sure only one thread requests the GC. > if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { > log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0); > Universe::heap()->collect(GCCause::_codecache_GC_aggressive); > } > ``` > > Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently. > > Would removing `_unloading_threshold_gc_requested` resolve this problem? It does, at the cost of many log messages: [0.047s][info][gc ] GC(0) Pause Young (Concurrent Start) (CodeCache GC Threshold) 2M->1M(512M) 4.087ms [0.047s][info][gc,cpu ] GC(0) User=0.01s Sys=0.00s Real=0.00s [0.047s][info][gc ] GC(1) Concurrent Mark Cycle [0.047s][info][gc,marking ] GC(1) Concurrent Scan Root Regions [0.048s][info][codecache ] Triggering threshold (7.654%) GC due to allocating 48.973% since last unloading (0.000% used -> 48.973% used) [0.048s][info][gc,marking ] GC(1) Concurrent Scan Root Regions 0.147ms [0.048s][info][gc,marking ] GC(1) Concurrent Mark [0.048s][info][gc,marking ] GC(1) Concurrent Mark From Roots [0.048s][info][codecache ] Triggering threshold (7.646%) GC due to allocating 49.028% since last unloading (0.000% used -> 49.028% used) [0.048s][info][codecache ] Triggering threshold (7.646%) GC due to allocating 49.028% since last unloading (0.000% used -> 49.028% used) [0.048s][info][codecache ] Triggering threshold (7.633%) GC due to allocating 49.114% since last unloading (0.000% used -> 49.114% used) [0.049s][info][gc,task ] GC(1) Using 6 workers of 6 for marking [0.049s][info][codecache ] Triggering threshold (7.625%) GC due to allocating 49.169% since last unloading (0.000% used -> 49.169% used) [0.049s][info][codecache ] Triggering threshold (7.616%) GC due to allocating 49.224% since last unloading (0.000% used -> 49.224% used) [...repeated 15 times...] [0.063s][info][codecache ] Triggering threshold (7.527%) GC due to allocating 49.820% since last unloading (0.000% used -> 49.820% used) [0.065s][info][codecache ] Triggering threshold (7.519%) GC due to allocating 49.875% since last unloading (0.000% used -> 49.875% used) [0.067s][info][codecache ] Triggering threshold (7.511%) GC due to allocating 49.930% since last unloading (0.000% used -> 49.930% used) [0.068s][info][gc,marking ] GC(1) Concurrent Mark From Roots 20.256ms [0.068s][info][gc,marking ] GC(1) Concurrent Preclean [0.068s][info][gc,marking ] GC(1) Concurrent Preclean 0.016ms [0.068s][info][gc,start ] GC(1) Pause Remark As you can see this is very annoying, particularly if the marking takes seconds all the while compiling is in progress. > > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc. > > ``` > if (!GCCause::is_explicit_full_gc(cause)) { > return; > } > ``` > > However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising. That's a different issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2999414442 From shade at openjdk.org Tue Jun 24 09:07:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Jun 2025 09:07:35 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v9] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 24 Jun 2025 04:03:42 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > uintx to size_t Looks okay. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25708#pullrequestreview-2952856203 From qxing at openjdk.org Tue Jun 24 09:28:19 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 24 Jun 2025 09:28:19 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v4] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision: - Move `TestCountBitsRange` to `compiler.c2.gvn` - Fix null checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25928/files - new: https://git.openjdk.org/jdk/pull/25928/files/1cb931b1..c965311b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=02-03 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From epeter at openjdk.org Tue Jun 24 09:30:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Jun 2025 09:30:32 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 12:30:18 GMT, Roland Westrelin wrote: >>> In the example above, the CastPPs are the bases. >> >> Aaaah, ok now it makes a little more sense to me :) >> >>> > Maybe some more full IR snippets could be helpful, maybe even IGV drawings. But that may be more work for you. >>> >>> I rarely use the IGV so, yeah, that would be more work. >> >> Then what about just the dump of the relevant IR nodes in text form? That is what I meant by `full IR snippets` ;) >> >> Is there any (reasonable) way to push the `CastPP` through the `AddP` here? I guess that may mean duplicating some `AddP` in some cases... But it could also give an opportunity for the `CastPP` to common further up that way. What do you think? It is hard for me to see through it without looking at some examples of the IR. > >> Then what about just the dump of the relevant IR nodes in text form? That is what I meant by `full IR snippets` ;) > > Are the omitted inputs to `AddP`s that you'd like to see? Anything else? Do you want to see them added to: > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > ? > >> Is there any (reasonable) way to push the `CastPP` through the `AddP` here? I guess that may mean duplicating some `AddP` in some cases... But it could also give an opportunity for the `CastPP` to common further up that way. What do you think? It is hard for me to see through it without looking at some examples of the IR. > > That's not where C2 expects the `CastPP`s to be so I suppose it could be quite disruptive but hard for me to tell how much. Beyond that, wouldn't we need to know if one `CastPP` dominates the other `CastPP` before we can common them and would have the same issue we have here? @rwestrel > Are the omitted inputs to AddPs that you'd like to see? Anything else? I'm looking for the dump of all relevant nodes :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-2995603140 From roland at openjdk.org Tue Jun 24 09:30:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 24 Jun 2025 09:30:32 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 09:09:04 GMT, Emanuel Peter wrote: > I'm looking for the dump of all relevant nodes :) https://bugs.openjdk.org/secure/attachment/115096/custom_debug.xml is an igv file with one graph per step above. Does that help, @eme64 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-2999538478 From qxing at openjdk.org Tue Jun 24 09:31:30 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 24 Jun 2025 09:31:30 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: <3PANfHJvyKO0XbT8eQ9tUin1kma8tQJt3eXtC-vnTaI=.624a186c-82cd-4292-9a8e-79b89e68d199@github.com> On Mon, 23 Jun 2025 10:40:13 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/countbitsnode.cpp line 53: >> >>> 51: if (t == Type::TOP) return Type::TOP; >>> 52: const TypeInt* ti = t->isa_int(); >>> 53: if (ti) { >> >> Implicit null check not allowed by style guide :) >> >> https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md >> >>> Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. > > I know it was like that before, but when we touch code we should fix it :) Updated. Thanks for pointing them out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2163444606 From qxing at openjdk.org Tue Jun 24 09:31:32 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 24 Jun 2025 09:31:32 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 10:40:59 GMT, Emanuel Peter wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Narrow type bound > > test/hotspot/jtreg/compiler/c2/irTests/TestCountBitsRange.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025 Alibaba Group Holding Limited. All Rights Reserved. > > Can you please not use the `irTests` directory, and instead put it in a more thematically relevant one? Maybe `gvn`? Okay, moved this test to `compiler.c2.gvn`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2163446317 From snatarajan at openjdk.org Tue Jun 24 09:40:05 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 24 Jun 2025 09:40:05 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v6] In-Reply-To: References: Message-ID: <-FjFhV2JHPldwZ76vO2ztYYLzyTn90vRR_dTeMrZa3I=.c26f3330-6992-46af-8f29-05af8affedc2@github.com> > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - merge with master #2 Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 - addressing review comments - update to IGV README based on review comment - addressing review comments - Merge master - addressing review on code style and adding failing - Initial Fix ------------- Changes: https://git.openjdk.org/jdk/pull/25682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=05 Stats: 77 lines in 11 files changed: 54 ins; 8 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From qxing at openjdk.org Tue Jun 24 09:47:28 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 24 Jun 2025 09:47:28 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 23:07:29 GMT, Chen Liang wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Narrow type bound > > test/hotspot/jtreg/compiler/c2/irTests/TestCountBitsRange.java line 47: > >> 45: @IR(failOn = IRNode.COUNT_LEADING_ZEROS_I) >> 46: public boolean clzCompareInt() { >> 47: return Integer.numberOfLeadingZeros(i) < 0 || Integer.numberOfLeadingZeros(i) > 32; > > You limited the type to check the ones/zeroes of the input value, `count_leading_zeros_int(~ti->_bits._zeros), count_leading_zeros_int(ti->_bits._ones),`. I think we need test cases to cover those too, given they are more complex and thus more error prone. I have no idea how the `KnownBits` works. I tried masking `i` with `i & 0x00ffffff | 0x0000ffff`, and evaluating CLZ on it. But C2 still seems to return an integer type with a trivial `KnownBits`, instead of `KnownBits` with `_zeros=0xff000000` and `_ones=0x0000ffff`. I also checked the PR which introduced `KnownBits` (https://github.com/openjdk/jdk/pull/17508). @merykitty said "there is no node taking advantage of the additional information yet", and there's no IR tests about it. So it's probably impossible to write a test for this at the moment? @merykitty Could you provide more help on this please? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2163478028 From qxing at openjdk.org Tue Jun 24 09:54:32 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 24 Jun 2025 09:54:32 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 23:10:58 GMT, Chen Liang wrote: >> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: >> >> Narrow type bound > > src/hotspot/share/opto/countbitsnode.cpp line 61: > >> 59: ti->_widen); >> 60: } >> 61: return TypeInt::INT; > > Just curious, when would this fallback path be used? When someone passes a non-integer to `CountLeadingZerosINode`, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2163491482 From snatarajan at openjdk.org Tue Jun 24 10:02:18 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 24 Jun 2025 10:02:18 GMT Subject: RFR: 8358641: C1 option -XX:+TimeEachLinearScan is broken Message-ID: **Issue** Using the command` java -Xcomp -XX:TieredStopAtLevel=1 -XX:+TimeEachLinearScan` results in an assert failure in line `assert(_cached_blocks.length() == ir()->linear_scan_order()->length()) failed: invalid cached block list`. **Suggestion** Removal of flag as this is a very old issue **Fix** Removed the flag by removing relevant methods and code while ensuring the removal does not affect other flags. ------------- Commit messages: - Initial Fix Changes: https://git.openjdk.org/jdk/pull/25933/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25933&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358641 Stats: 51 lines in 3 files changed: 1 ins; 49 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25933.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25933/head:pull/25933 PR: https://git.openjdk.org/jdk/pull/25933 From epeter at openjdk.org Tue Jun 24 10:16:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Jun 2025 10:16:27 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: <30ySCTGBD7-5H6jOUKdew_CJbUXsspXbE4RFv9825Xc=.f6327a59-4ce9-410a-b3e2-a27eedd09a2a@github.com> On Tue, 24 Jun 2025 09:28:13 GMT, Roland Westrelin wrote: >> @rwestrel >>> Are the omitted inputs to AddPs that you'd like to see? Anything else? >> >> I'm looking for the dump of all relevant nodes :) > >> I'm looking for the dump of all relevant nodes :) > > https://bugs.openjdk.org/secure/attachment/115096/custom_debug.xml > > is an igv file with one graph per step above. Does that help, @eme64 ? @rwestrel Thanks for the file! I fear that would also require me to dig through it myself, and extract the relevant information. I was hoping for a regular `node->dump` or `node->dump_bfs` of a limited graph in text form. I can also do the digging myself, and look at the regression test in the debugger. That will take me more time that I probably won't have until mid August. I don't want to block this here though, maybe someone else has more time in these summer months :) -------------------- On the high level, this was you proposal: >The fix I propose is to delay the call to PhiNode::unique_input() with uncast = true if the Phi's inputs are cast nodes and have yet to be processed by igvn. This causes identical CastPPs to common and then only the Phi has 2 identical inputs is transformed to that input (rather than have a new CastPPs be created at a different control). Does this delaying always work? Or could there be cases where we might not know how long to delay the Phi optimization, and then we are back to the same issue, just in a more complicated example? Or is there some sort of guarantee that the "non matching base pointers" can only occur during a IGVN phase, and MUST disappear at the end of it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-2999677592 From tkurashige at openjdk.org Tue Jun 24 12:21:32 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Tue, 24 Jun 2025 12:21:32 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: <567JBE7ycfnYTAbDbM7puD2K-t9Y7KNEGsp-fCHmjJE=.1a148045-ee2d-49fe-be18-fadcf04d4de3@github.com> On Mon, 23 Jun 2025 12:13:58 GMT, Manuel H?ssig wrote: >> Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message and revert lines for Xlog > > Thank you for addressing my comments and for your explanation! > > The changes look good to me, but nonetheless I kicked off some testing on our side and I'll get back to you with the results. @mhaessig Thank you for your review. I think I need approval from 1 reviewer, could you introduce someone? If not, I will wait for someone to check this PR. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-3000128358 From bkilambi at openjdk.org Tue Jun 24 13:19:17 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 24 Jun 2025 13:19:17 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/aa9e53e1..31973045 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=02-03 Stats: 388 lines in 7 files changed: 303 ins; 12 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Tue Jun 24 13:36:14 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 24 Jun 2025 13:36:14 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v5] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Revert a small change in c2_MacroAssembler.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/31973045..956518ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Tue Jun 24 13:36:14 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 24 Jun 2025 13:36:14 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v3] In-Reply-To: References: Message-ID: On Tue, 17 Jun 2025 08:54:47 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments and added a JTREG test > > src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5155: > >> 5153: // values across function calls and usually used for long-lived values), we can use any two volatile >> 5154: // registers between V16-V31. >> 5155: instruct vselect_from_two_vectors_HS_Neon(vReg dst, vReg_V17 src1, vReg_V18 src2, > > I think it's worth replicating this pattern a couple of times with more vector registers. The allocator might prefer some in the set v8-v15. I tried to add 4 different variants for each match rule and chose the registers at random (two from the volatile and two from the non-volatile set). Please review if this is ok? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2164023985 From eastigeevich at openjdk.org Tue Jun 24 14:53:45 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 24 Jun 2025 14:53:45 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2] In-Reply-To: References: Message-ID: > There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: > - https://github.com/mysql/mysql-server/pull/611 > - https://github.com/facebook/folly/pull/2390 > > There are discussions regarding using it for spin pauses: > - https://github.com/gperftools/gperftools/pull/1594 > - https://github.com/haproxy/haproxy/pull/2974 > > Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tests: > - Gtests passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. > > Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): > > > Benchmark Mode Cnt Score Error Units Diff > ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op > ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff > ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% > ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% > ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% > ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% > ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% > ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: - Add SB detection - Add support for SB to MacroAssembler::spin_wait ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25801/files - new: https://git.openjdk.org/jdk/pull/25801/files/74c37f19..a045194c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=00-01 Stats: 32 lines in 9 files changed: 27 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25801.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25801/head:pull/25801 PR: https://git.openjdk.org/jdk/pull/25801 From epeter at openjdk.org Tue Jun 24 16:30:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 24 Jun 2025 16:30:28 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Wed, 18 Jun 2025 04:03:59 GMT, Jasmine Karthikeyan wrote: >> And just for good measure: should we also add tests for `char`? > > @eme64 I've updated the patch to address the comments, let me know what you think! > > @mur47x111 Thanks for the comment! I've merged from master. @jaskarth I just checked the results, and there is a series of failing tests. ------------------------ `compiler/c2/Test6958485.java` Flags: `-server -Xcomp` or `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`. `# assert(false) failed: Unexpected node in SuperWord truncation: Conv2B` -------------------------- `compiler/intrinsics/TestDoubleIsInfinite.java` -> D `compiler/intrinsics/TestFloatIsInfinite.java` -> F Flags: `-XX:UseAVX=3` or `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing` ... not sure if any are necessary actually. `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteD` and `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteF` ----------------------------- `jdk/incubator/vector/Byte128VectorTests.java` (same issue with all related vector tests, just reporting one here) Flag: `-XX:UseAVX=2` `# assert(false) failed: Unexpected node in SuperWord truncation: AddReductionVI` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-3001137954 From snatarajan at openjdk.org Tue Jun 24 16:32:11 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 24 Jun 2025 16:32:11 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: > This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. > > Changes: > - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. > - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` > - Added a new Ideal phase for individual macro elimination steps. > - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). > > Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . > ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) > > Questions to reviewers: > - Is the new macro elimination phase OK, or should we change anything? > - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? > > Testing: > GitHub Actions > tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) Saranya Natarajan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: merge with master Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25682/files - new: https://git.openjdk.org/jdk/pull/25682/files/fc434b6a..939be78b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25682&range=05-06 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25682/head:pull/25682 PR: https://git.openjdk.org/jdk/pull/25682 From snatarajan at openjdk.org Tue Jun 24 16:39:28 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Tue, 24 Jun 2025 16:39:28 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 16:32:11 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > merge with master > Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 I did a force push without having read the guide properly. However, please note that the commit [fc434b6](https://github.com/openjdk/jdk/commit/fc434b6a2d3d1affa8599450e25ec983b73b6fee) was merge with master with no changes from me and force push commit [939be78](https://github.com/openjdk/jdk/commit/939be78b0aaa24b00b8d63b36460bcd210774def) was just changing the commit message. I am really sorry for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25682#issuecomment-3001161136 From shade at openjdk.org Tue Jun 24 16:43:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Jun 2025 16:43:33 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 14:53:45 GMT, Evgeny Astigeevich wrote: >> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: >> - https://github.com/mysql/mysql-server/pull/611 >> - https://github.com/facebook/folly/pull/2390 >> >> There are discussions regarding using it for spin pauses: >> - https://github.com/gperftools/gperftools/pull/1594 >> - https://github.com/haproxy/haproxy/pull/2974 >> >> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension >> >> CPUs supporting it: >> - Apple M2+ >> - Neoverse-N2 >> - Neoverse-V2 >> >> Tests: >> - Gtests passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. >> >> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): >> >> >> Benchmark Mode Cnt Score Error Units Diff >> ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op >> ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% >> >> Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff >> ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% > > Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: > > - Add SB detection > - Add support for SB to MacroAssembler::spin_wait Looks reasonable, but test needs more work. Also, merge from mainline to get windows-aarch64 build fix, so that we test things there too. test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 36: > 34: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 isb 3 > 35: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 yield 1 > 36: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 sb Since we are touching up the test: maybe just say `sb 1` explicitly, and then read `spinWaitInstCount` from `args[2]` unconditionally? test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 80: > 78: OutputAnalyzer analyzer = new OutputAnalyzer(pb.start()); > 79: > 80: if (analyzer.getExitValue() != 0 && "sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) { The logic here is a bit off. Suppose we _do_ have non-zero exit code for, say, `isb`. This would not fail the test now. Do it something like this instead? if ("sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) { System.out.println("Skipping the test. The current CPU does not support SB instruction."); return; } analyzer.shouldHaveExitValue(0); ------------- PR Review: https://git.openjdk.org/jdk/pull/25801#pullrequestreview-2954582448 PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3001173366 PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2164468471 PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2164461092 From cslucas at openjdk.org Tue Jun 24 17:11:33 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 24 Jun 2025 17:11:33 GMT Subject: RFR: 8360049: CodeInvalidationReasonTest.java fails with ZGC on AArch64 In-Reply-To: References: <72rTsZwLWfLVCjUXFLIjleCopoc0wHHvii0kStbz-AU=.adffe754-f82f-4606-b152-90710d402164@github.com> Message-ID: On Fri, 20 Jun 2025 13:24:34 GMT, Doug Simon wrote: >> Marked as reviewed by shade (Reviewer). > > Thanks for the revie @shipilev . Thank you for fixing @dougxc ------------- PR Comment: https://git.openjdk.org/jdk/pull/25911#issuecomment-3001243313 From jbhateja at openjdk.org Tue Jun 24 19:02:44 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 24 Jun 2025 19:02:44 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Fri, 20 Jun 2025 11:18:35 GMT, Jatin Bhateja wrote: >> @jatin-bhateja >> I see one test failing that looks related: >> `compiler/vectorization/TestFloat16VectorOperations.java` >> The run was done with lots of stress flags, probably not all are relevant, and it may be intermittent. >> `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1` >> >> >> # Internal Error (.../src/hotspot/share/opto/type.hpp:2234), pid=2401514, tid=2401533 >> # assert(_base == FloatCon) failed: Not a Float >> >> ... >> >> Current CompileTask: >> C2:1885 336 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddFloat16 @ 2 (46 bytes) >> >> Stack: [0x00007ff300480000,0x00007ff300580000], sp=0x00007ff30057aed0, free space=1003k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xc0358e] ConvF2HFNode::Ideal(PhaseGVN*, bool)+0x88e (type.hpp:2234) >> V [libjvm.so+0x181809d] PhaseIterGVN::transform_old(Node*)+0xbd (phaseX.cpp:668) >> V [libjvm.so+0x181c705] PhaseIterGVN::optimize()+0xc5 (phaseX.cpp:1054) >> V [libjvm.so+0xb498f2] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x722 (loopnode.hpp:1268) >> V [libjvm.so+0xb43630] Compile::Optimize()+0xb00 (compile.cpp:2468) >> V [libjvm.so+0xb46943] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1f33 (compile.cpp:868) >> V [libjvm.so+0x96bdd7] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x467 (c2compiler.cpp:141) >> V [libjvm.so+0xb55d78] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb58 (compileBroker.cpp:2323) >> V [libjvm.so+0xb56f48] CompileBroker::compiler_thread_loop()+0x578 (compileBroker.cpp:1967) >> V [libjvm.so+0x10aa00b] JavaThread::thread_main_inner()+0x13b (javaThread.cpp:773) >> V [libjvm.so+0x1b0e096] Thread::call_run()+0xb6 (thread.cpp:243) >> V [libjvm.so+0x17893f8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:868) > > Hi @eme64 , let me know if this looks good to land now. > @jatin-bhateja The code looks good to me! I'll run some testing before approving. > > But someone else could already get started with a second review. Thanks @eme64 , let us know once you are through with testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-3001555853 From sviswanathan at openjdk.org Tue Jun 24 21:47:33 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 24 Jun 2025 21:47:33 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v13] In-Reply-To: <7riWgCQ_m74kYs-gbmKz15oaQ6qKM1ftjoiCdJSLYlo=.158be671-75b9-44d8-8992-2f1c9ff22c89@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <7riWgCQ_m74kYs-gbmKz15oaQ6qKM1ftjoiCdJSLYlo=.158be671-75b9-44d8-8992-2f1c9ff22c89@github.com> Message-ID: On Thu, 19 Jun 2025 07:50:22 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/subnode.cpp > > Co-authored-by: Emanuel Peter test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 81: > 79: private static final float EXACT_FP16 = 2052.0f; > 80: private static final float SNAN_FP16 = Float.intBitsToFloat(0x7F8000F0); > 81: private static final float QNAN_FP16 = Float.intBitsToFloat(0x7FC00000); A nitpick, the values assigned here are float SNaN and QNaN. The FP16 suffix could be removed. test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 381: > 379: > 380: @Check(test="testSNaNFP16ConstantPatterns") > 381: public void checkSNaNFP16ConstantPatterns(short actual) throws Exception { Why throw exception here? This is short comparison in Verify.checkEQ. Also Java doesn't inherently throw exception for SNaN. test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 649: > 647: // FIXME : C2 compiler limitaition to identify sign of ZERO value. > 648: // assertResult(divide(valueOf(2.0f), NEGATIVE_ZERO).floatValue(), Float.NEGATIVE_INFINITY, "testDivConstantFolding"); > 649: // assertResult(divide(valueOf(2.0f), POSITIVE_ZERO).floatValue(), Float.POSITIVE_INFINITY, "testDivConstantFolding"); Would be good to file a JBS for this compiler limitation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2164970718 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2164968367 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2164961346 From xgong at openjdk.org Wed Jun 25 01:19:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 01:19:47 GMT Subject: RFR: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> <698Q9LoBFMdDFBnBVAB8FYiI0U-abyXms26RLoMv5Xc=.f21b9a25-8f64-412c-b37a-553f0a13192e@github.com> Message-ID: On Wed, 4 Jun 2025 09:03:21 GMT, Emanuel Peter wrote: >> Hi @eme64 , I'v updated the IR test and JMH based on your comments. Could you please help review whether it's fine to you. Thanks for all your suggestion! >> >> Following shows the performance data of the new JMH test on Grace (the performance gain is almost the same on my x64 machine): >> >> Benchmark Mode Cnt limit Unit Before Error (99.9%) After Error (99.9%) Gain >> CountedLoopCastIV.loop_iv_int thrpt 30 1024 ops/s 1225620.536 39505.158362 5778120.132 4781.602088 4.71 >> CountedLoopCastIV.loop_iv_int thrpt 30 1536 ops/s 830600.832 14758.561182 3839404.338 3362.727083 4.62 >> CountedLoopCastIV.loop_iv_int thrpt 30 2048 ops/s 618114.174 36999.511727 2890853.495 416.969862 4.67 >> CountedLoopCastIV.loop_iv_long thrpt 30 1024 ops/s 1063902.078 4616.608855 1314828.963 1267.470199 1.23 >> CountedLoopCastIV.loop_iv_long thrpt 30 1536 ops/s 714538.178 630.085477 870801.472 753.347684 1.21 >> CountedLoopCastIV.loop_iv_long thrpt 30 2048 ops/s 536724.086 131.313178 652775.363 539.107806 1.21 >> >> >> The error term is larger as before. But I don't think this is caused by the large variance of loop iterations. Does the new benchmark look fine to you? Thanks! > > @XiaohongGong Let's please delay this until after Thursday, so that this does not go into JDK25 yet, and we have more time to fix it if something goes wrong down the line. Thanks so much for your review @eme64 @chhagedorn @galderz ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25539#issuecomment-3002291747 From xgong at openjdk.org Wed Jun 25 01:19:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 01:19:48 GMT Subject: Integrated: 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter In-Reply-To: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> References: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> Message-ID: On Fri, 30 May 2025 07:43:29 GMT, Xiaohong Gong wrote: > C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. > This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer > to the detailed discussion for a related performance issue from [1]. > > The ideal graph of such a loop typically looks like: > > > /-----------| > | | > | ConI | > loop | / / > | | / / > \ AddI / > RangeCheck \ / | > | \ / | > IfTrue Phi | > \ | | > RangeCheck \ | | > \ CastII / <- Range check #1 > | | / > IfTrue | | > \ | | > CastII | <- Range check #2 > | / > |-------/ > > > > For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used > by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. > > This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. > > Test: > - Tested tier1, tier2, tier3, and no regressions are found. > - An additional test case is added to verify the fix. > > Performance: > Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: > > > Benchmark Mode Cnt Unit Before After Gain > CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 > CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 > > > We can also observe the similar uplift on a x86_64 machine. > > [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 This pull request has now been integrated. Changeset: 7d6c902c Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/7d6c902ce8ffb9b42c264ecff56d4b54206e101b Stats: 275 lines in 3 files changed: 274 ins; 0 del; 1 mod 8357726: Improve C2 to recognize counted loops with multiple casts in trip counter Reviewed-by: chagedorn, epeter, galder ------------- PR: https://git.openjdk.org/jdk/pull/25539 From iveresov at openjdk.org Wed Jun 25 01:20:12 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 25 Jun 2025 01:20:12 GMT Subject: RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded Message-ID: For object arrays we forgot to check if the element type if loaded and always returned `true` from `is_klass_loaded()`. Testing is a bit noisy but seems to be clean. ------------- Commit messages: - Check if element type klass is loaded for object arrays Changes: https://git.openjdk.org/jdk/pull/25965/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25965&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359788 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25965/head:pull/25965 PR: https://git.openjdk.org/jdk/pull/25965 From amitkumar at openjdk.org Wed Jun 25 04:41:42 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 25 Jun 2025 04:41:42 GMT Subject: RFR: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 [v9] In-Reply-To: References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 24 Jun 2025 04:03:42 GMT, Amit Kumar wrote: >> Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > uintx to size_t Thanks again for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25708#issuecomment-3003172821 From amitkumar at openjdk.org Wed Jun 25 04:41:43 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 25 Jun 2025 04:41:43 GMT Subject: Integrated: 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 In-Reply-To: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> References: <_SfCjE8oyyD-grL6AAih23j1Qx7fbYefTVzl-BU5N2k=.7bd5d419-4127-4e78-926e-fece25a7d914@github.com> Message-ID: On Tue, 10 Jun 2025 04:46:43 GMT, Amit Kumar wrote: > Makes sure that JVM exits gracefully when `CodeCacheSegmentSize` is not a power of 2. This pull request has now been integrated. Changeset: 263e32bb Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/263e32bb8507310dd4c9a4eca7f6e428303d3a53 Stats: 84 lines in 2 files changed: 84 ins; 0 del; 0 mod 8358694: VM asserts if CodeCacheSegmentSize is not a power of 2 Reviewed-by: shade, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/25708 From jbhateja at openjdk.org Wed Jun 25 04:44:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 25 Jun 2025 04:44:11 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v14] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/428a50a6..2a8ed60e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=12-13 Stats: 33 lines in 2 files changed: 0 ins; 29 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Wed Jun 25 04:44:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 25 Jun 2025 04:44:12 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v13] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <7riWgCQ_m74kYs-gbmKz15oaQ6qKM1ftjoiCdJSLYlo=.158be671-75b9-44d8-8992-2f1c9ff22c89@github.com> Message-ID: On Tue, 24 Jun 2025 21:42:47 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/subnode.cpp >> >> Co-authored-by: Emanuel Peter > > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 81: > >> 79: private static final float EXACT_FP16 = 2052.0f; >> 80: private static final float SNAN_FP16 = Float.intBitsToFloat(0x7F8000F0); >> 81: private static final float QNAN_FP16 = Float.intBitsToFloat(0x7FC00000); > > A nitpick, the values assigned here are float SNaN and QNaN. The FP16 suffix could be removed. Nomenclature is in the context of its usage. > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 381: > >> 379: >> 380: @Check(test="testSNaNFP16ConstantPatterns") >> 381: public void checkSNaNFP16ConstantPatterns(short actual) throws Exception { > > Why throw exception here? This is short comparison in Verify.checkEQ. Also Java doesn't inherently throw exception for SNaN. I restructured the tests to use Verify.checkEQ, this was an artifact of an earlier changes. Removing it. > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 649: > >> 647: // FIXME : C2 compiler limitaition to identify sign of ZERO value. >> 648: // assertResult(divide(valueOf(2.0f), NEGATIVE_ZERO).floatValue(), Float.NEGATIVE_INFINITY, "testDivConstantFolding"); >> 649: // assertResult(divide(valueOf(2.0f), POSITIVE_ZERO).floatValue(), Float.POSITIVE_INFINITY, "testDivConstantFolding"); > > Would be good to file a JBS for this compiler limitation. Good catch, it's a wrong comment, lifted it. I also filed another JBS for missing value transforms for DivF/DivD https://bugs.openjdk.org/browse/JDK-8360460 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2165724360 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2165724495 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2165724618 From kvn at openjdk.org Wed Jun 25 05:05:26 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 25 Jun 2025 05:05:26 GMT Subject: RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 01:13:46 GMT, Igor Veresov wrote: > For object arrays we forgot to check if the element type if loaded and always returned `true` from `is_klass_loaded()`. > Testing is a bit noisy but seems to be clean. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25965#pullrequestreview-2956575134 From jkarthikeyan at openjdk.org Wed Jun 25 05:56:22 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 25 Jun 2025 05:56:22 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Add more nodes to the non-truncating list ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/edce2a12..c219e38e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=04-05 Stats: 12 lines in 1 file changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Wed Jun 25 06:03:29 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 25 Jun 2025 06:03:29 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 16:27:29 GMT, Emanuel Peter wrote: >> @eme64 I've updated the patch to address the comments, let me know what you think! >> >> @mur47x111 Thanks for the comment! I've merged from master. > > @jaskarth I just checked the results, and there is a series of failing tests. > > ------------------------ > `compiler/c2/Test6958485.java` > > Flags: `-server -Xcomp` or `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`. > > `# assert(false) failed: Unexpected node in SuperWord truncation: Conv2B` > > -------------------------- > > `compiler/intrinsics/TestDoubleIsInfinite.java` -> D > `compiler/intrinsics/TestFloatIsInfinite.java` -> F > > Flags: `-XX:UseAVX=3` or `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing` ... not sure if any are necessary actually. > > `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteD` > and > `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteF` > > ----------------------------- > > `jdk/incubator/vector/Byte128VectorTests.java` (same issue with all related vector tests, just reporting one here) > > Flag: `-XX:UseAVX=2` > > `# assert(false) failed: Unexpected node in SuperWord truncation: AddReductionVI` @eme64 Thanks for the test results! I've added these nodes to the non-truncating list, as well as the other reduction nodes that showed up when running the vector tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-3003445470 From thartmann at openjdk.org Wed Jun 25 06:32:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 25 Jun 2025 06:32:27 GMT Subject: RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 01:13:46 GMT, Igor Veresov wrote: > For object arrays we forgot to check if the element type if loaded and always returned `true` from `is_klass_loaded()`. > Testing is a bit noisy but seems to be clean. Looks reasonable to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25965#pullrequestreview-2956806215 From xgong at openjdk.org Wed Jun 25 06:43:32 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 06:43:32 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:06:53 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 262: > >> 260: (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize )) { >> 261: return false; >> 262: } > > How about: > > case Op_SelectFromTwoVector: > // The "tbl" instruction for two vector table is supported only in Neon and SVE2. Return > // false if vector length > 16B but supported SVE version < 2. > // > // Additionally, this operation is disabled for doubles and longs on machines with SVE < 2, > // Instead, the default VectorRearrange + VectorBlend is generated as the performance of > // the default pattern is slightly better. > if (UseSVE < 2 && (type2aelembytes(bt) == 8 || length_in_bytes > 16)) { > return false; > } > > // As the SVE2 "tbl" instruction is unpredicated and partial operations cannot be generated > // using masks, we currently disable this operation on machines where length_in_bytes < > // MaxVectorSize with the only exception of 8B vector length. > if (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize)) { > return false; > } > > break; Maybe the NEON `tbl` can also be generated for SVE2 when `length_in_bytes == 16 && length_in_bytes < MaxVectorSize`. This is a special partial version for SVE2. As a summary, The match rule's predicate will be: 1) NEON: UseSVE < 2 || (length_in_bytes < 16 || length_in_bytes < MaxVectorSize) 2) SVE: UseSVE ==2 && (length_in_bytes >= 16 && length_in_bytes == MaxVectorSize) Seems this will make predicate or code here more complex. Advantage is this op with 128 vector shape on a SVE2 256 or larger size machine will also be intrinsified. It's not a block and change or not is up to you. We can also revisit this part once the 256-bit SVE2 machine exist in future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2165895697 From xgong at openjdk.org Wed Jun 25 06:43:31 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 06:43:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 13:19:17 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments src/hotspot/cpu/aarch64/aarch64_vector.ad line 255: > 253: // the default VectorRearrange + VectorBlend is generated as the performance of the default > 254: // implementation was slightly better/similar than the implementaion for SelectFromTwoVector. > 255: // As the SVE2 "tbl" instruction in unpredicated and partial operations cannot be generated `in` -> `is` src/hotspot/cpu/aarch64/aarch64_vector.ad line 260: > 258: case Op_SelectFromTwoVector: > 259: if ((UseSVE < 2 && (type2aelembytes(bt) == 8 || length_in_bytes > 16)) || > 260: (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize )) { style: `length_in_bytes < MaxVectorSize ))` -> `length_in_bytes < MaxVectorSize))` src/hotspot/cpu/aarch64/aarch64_vector.ad line 262: > 260: (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize )) { > 261: return false; > 262: } How about: case Op_SelectFromTwoVector: // The "tbl" instruction for two vector table is supported only in Neon and SVE2. Return // false if vector length > 16B but supported SVE version < 2. // // Additionally, this operation is disabled for doubles and longs on machines with SVE < 2, // Instead, the default VectorRearrange + VectorBlend is generated as the performance of // the default pattern is slightly better. if (UseSVE < 2 && (type2aelembytes(bt) == 8 || length_in_bytes > 16)) { return false; } // As the SVE2 "tbl" instruction is unpredicated and partial operations cannot be generated // using masks, we currently disable this operation on machines where length_in_bytes < // MaxVectorSize with the only exception of 8B vector length. if (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize)) { return false; } break; src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2887: > 2885: // Generate Neon tbl when UseSVE == 0 or UseSVE == 1 with vector length of 16B > 2886: > 2887: bool useNeon = (UseSVE == 0) || (UseSVE == 1 && isQ); The function name is `select_from_two_vectors_HS_Neon`, but we still have to check whether to use NEON inside it. It looks confusing. Is it better to split the special `!isQ && UseSVE >=1` cases and combine it to below `select_from_two_vectors` method? Combining `!isQ` to the sve rule may also make the rule's predicate simpler? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2165863550 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2165864681 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2165862863 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2165916979 From epeter at openjdk.org Wed Jun 25 06:47:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 25 Jun 2025 06:47:32 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 05:56:22 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Add more nodes to the non-truncating list src/hotspot/share/opto/superword.cpp line 2552: > 2550: switch (opc) { > 2551: case Op_ExtractS: > 2552: case Op_ExtractC: Are there any tests for these somewhere? src/hotspot/share/opto/superword.cpp line 2600: > 2598: case Op_XorReductionV: > 2599: case Op_MaxReductionV: > 2600: case Op_MinReductionV: Why do these vector nodes even end up here? Is that expected? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2165925142 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2165924451 From iveresov at openjdk.org Wed Jun 25 06:48:31 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 25 Jun 2025 06:48:31 GMT Subject: RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 01:13:46 GMT, Igor Veresov wrote: > For object arrays we forgot to check if the element type if loaded and always returned `true` from `is_klass_loaded()`. > Testing is a bit noisy but seems to be clean. Thanks Vladimir and Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25965#issuecomment-3003558563 From iveresov at openjdk.org Wed Jun 25 06:48:32 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 25 Jun 2025 06:48:32 GMT Subject: Integrated: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 01:13:46 GMT, Igor Veresov wrote: > For object arrays we forgot to check if the element type if loaded and always returned `true` from `is_klass_loaded()`. > Testing is a bit noisy but seems to be clean. This pull request has now been integrated. Changeset: 5c4f92ba Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/5c4f92ba9a2b820fa12920400c9037b5d3c37aa4 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25965 From epeter at openjdk.org Wed Jun 25 06:53:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 25 Jun 2025 06:53:29 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: <_BPaZr5a3KNidvx3XzOm-0BtTUdnnLl5qfrYZEtGkzg=.46c30e02-75dc-4b72-8aa4-74c4fcab2fe2@github.com> On Wed, 25 Jun 2025 06:00:40 GMT, Jasmine Karthikeyan wrote: >> @jaskarth I just checked the results, and there is a series of failing tests. >> >> ------------------------ >> `compiler/c2/Test6958485.java` >> >> Flags: `-server -Xcomp` or `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`. >> >> `# assert(false) failed: Unexpected node in SuperWord truncation: Conv2B` >> >> -------------------------- >> >> `compiler/intrinsics/TestDoubleIsInfinite.java` -> D >> `compiler/intrinsics/TestFloatIsInfinite.java` -> F >> >> Flags: `-XX:UseAVX=3` or `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing` ... not sure if any are necessary actually. >> >> `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteD` >> and >> `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteF` >> >> ----------------------------- >> >> `jdk/incubator/vector/Byte128VectorTests.java` (same issue with all related vector tests, just reporting one here) >> >> Flag: `-XX:UseAVX=2` >> >> `# assert(false) failed: Unexpected node in SuperWord truncation: AddReductionVI` > > @eme64 Thanks for the test results! I've added these nodes to the non-truncating list, as well as the other reduction nodes that showed up when running the vector tests. @jaskarth Thanks for the updates! ? I'm running another round of testing, to see if we now handled all cases :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-3003580231 From epeter at openjdk.org Wed Jun 25 06:53:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 25 Jun 2025 06:53:31 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:44:24 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more nodes to the non-truncating list > > src/hotspot/share/opto/superword.cpp line 2600: > >> 2598: case Op_XorReductionV: >> 2599: case Op_MaxReductionV: >> 2600: case Op_MinReductionV: > > Why do these vector nodes even end up here? Is that expected? Additionallly, it may be good to say why each one operation here is not truncatable. We can also file a follow-up RFE here, to add more investigation. After all, this is a bug-fix and we don't want to work in it too long now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2165931037 From epeter at openjdk.org Wed Jun 25 06:59:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 25 Jun 2025 06:59:31 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Tue, 24 Jun 2025 18:59:42 GMT, Jatin Bhateja wrote: >> Hi @eme64 , let me know if this looks good to land now. > >> @jatin-bhateja The code looks good to me! I'll run some testing before approving. >> >> But someone else could already get started with a second review. > > Thanks @eme64 , let us know once you are through with testing. @jatin-bhateja The last testing I did passed. I'm now waiting for @sviswa7 to give the approval, then run testing again on our side. I don't have any machine that supports float16, so we are relying on you to run sufficient testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-3003595299 From iveresov at openjdk.org Wed Jun 25 07:02:05 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 25 Jun 2025 07:02:05 GMT Subject: [jdk25] RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded Message-ID: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded ------------- Commit messages: - Backport 5c4f92ba9a2b820fa12920400c9037b5d3c37aa4 Changes: https://git.openjdk.org/jdk/pull/25969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25969&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359788 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25969/head:pull/25969 PR: https://git.openjdk.org/jdk/pull/25969 From bkilambi at openjdk.org Wed Jun 25 08:09:35 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 25 Jun 2025 08:09:35 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:28:55 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 262: >> >>> 260: (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize )) { >>> 261: return false; >>> 262: } >> >> How about: >> >> case Op_SelectFromTwoVector: >> // The "tbl" instruction for two vector table is supported only in Neon and SVE2. Return >> // false if vector length > 16B but supported SVE version < 2. >> // >> // Additionally, this operation is disabled for doubles and longs on machines with SVE < 2, >> // Instead, the default VectorRearrange + VectorBlend is generated as the performance of >> // the default pattern is slightly better. >> if (UseSVE < 2 && (type2aelembytes(bt) == 8 || length_in_bytes > 16)) { >> return false; >> } >> >> // As the SVE2 "tbl" instruction is unpredicated and partial operations cannot be generated >> // using masks, we currently disable this operation on machines where length_in_bytes < >> // MaxVectorSize with the only exception of 8B vector length. >> if (UseSVE == 2 && length_in_bytes > 8 && length_in_bytes < MaxVectorSize)) { >> return false; >> } >> >> break; > > Maybe the NEON `tbl` can also be generated for SVE2 when `length_in_bytes == 16 && length_in_bytes < MaxVectorSize`. This is a special partial version for SVE2. As a summary, The match rule's predicate will be: > 1) NEON: UseSVE < 2 || (length_in_bytes < 16 || length_in_bytes < MaxVectorSize) > 2) SVE: UseSVE ==2 && (length_in_bytes >= 16 && length_in_bytes == MaxVectorSize) > > Seems this will make predicate or code here more complex. Advantage is this op with 128 vector shape on a SVE2 256 or larger size machine will also be intrinsified. It's not a block and change or not is up to you. We can also revisit this part once the 256-bit SVE2 machine exist in future. Thanks @XiaohongGong . The case you mention will need an SVE2 machine with MaxVectorSize >= 32B which is currently not available. I think it's better if we revisit these cases once a functioning hardware is available. Shall I add a comment here as a reminder that we need to revisit when such hardware is available? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166075939 From xgong at openjdk.org Wed Jun 25 08:17:34 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 08:17:34 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 08:06:40 GMT, Bhavana Kilambi wrote: >> Maybe the NEON `tbl` can also be generated for SVE2 when `length_in_bytes == 16 && length_in_bytes < MaxVectorSize`. This is a special partial version for SVE2. As a summary, The match rule's predicate will be: >> 1) NEON: UseSVE < 2 || (length_in_bytes < 16 || length_in_bytes < MaxVectorSize) >> 2) SVE: UseSVE ==2 && (length_in_bytes >= 16 && length_in_bytes == MaxVectorSize) >> >> Seems this will make predicate or code here more complex. Advantage is this op with 128 vector shape on a SVE2 256 or larger size machine will also be intrinsified. It's not a block and change or not is up to you. We can also revisit this part once the 256-bit SVE2 machine exist in future. > > Thanks @XiaohongGong . The case you mention will need an SVE2 machine with MaxVectorSize >= 32B which is currently not available. I think it's better if we revisit these cases once a functioning hardware is available. Shall I add a comment here as a reminder that we need to revisit when such hardware is available? I think it's fine without more comments. We can get the information from rules or from performance results. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166090846 From snatarajan at openjdk.org Wed Jun 25 08:48:31 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Wed, 25 Jun 2025 08:48:31 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Wed, 18 Jun 2025 16:22:24 GMT, Daniel Lund?n wrote: >> This placement of ` refine_strip_mined_loop_macro_nodes()` is ok as it only affects the functionality in the loop of the `eliminate_opaque_looplimit_macro_nodes` method. > > Could you elaborate a bit on why this is the case? Just looking briefly at `OuterStripMinedLoopNode::adjust_strip_mined_loop` (called from `refine_strip_mined_loop_macro_nodes`), I'm not convinced there are no other interactions. Before https://github.com/openjdk/jdk/pull/24890, `OuterStripMinedLoopNode::adjust_strip_mined_loop` was called just before `C->remove_macro_node(n) `inside the condition `n->Opcode() == Op_OuterStripMinedLoop` [(line 2523)](https://github.com/openjdk/jdk/pull/25682/commits/f42ec1a27ad1767580ef4d480ded846a5ea9fc6a#diff-2faebd05d08f9115f8d9ef771644cf05087a6986c2f9013d7163c6aa720169c3R2523). This lead to assert failure as ` OuterStripMinedLoopNode::adjust_strip_mined_loop` added a new node to the macro list. The conclusion from https://github.com/openjdk/jdk/pull/24890 was that ` OuterStripMinedLoopNode::adjust_strip_mined_loop` should be called before we start the macro elimination and remove macro nodes with Opcode `Op_OuterStripMinedLoop`. This is my reasoning for its current placement to be okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2166166002 From bkilambi at openjdk.org Wed Jun 25 08:52:35 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 25 Jun 2025 08:52:35 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:39:39 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2887: > >> 2885: // Generate Neon tbl when UseSVE == 0 or UseSVE == 1 with vector length of 16B >> 2886: >> 2887: bool useNeon = (UseSVE == 0) || (UseSVE == 1 && isQ); > > The function name is `select_from_two_vectors_HS_Neon`, but we still have to check whether to use NEON inside it. It looks confusing. Is it better to split the special `!isQ && UseSVE >=1` cases and combine it to below `select_from_two_vectors` method? > > Combining `!isQ` to the sve rule may also make the rule's predicate simpler? Thanks, I can do that. Earlier I was trying to keep in line with VectorRearrange which had similar name and if I remember correctly had both Neon and SVE implementation in the one named as "Neon" (before you optimized it recently) and I continued keeping the same format. I agree, this can be simplified further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166178111 From xgong at openjdk.org Wed Jun 25 09:16:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 09:16:48 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API [v2] In-Reply-To: References: Message-ID: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Address review comments - Merge 'jdk:master' into JDK-8355563 - 8355563: VectorAPI: Refactor current implementation of subword gather load API ------------- Changes: https://git.openjdk.org/jdk/pull/25138/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25138&range=01 Stats: 450 lines in 15 files changed: 109 ins; 176 del; 165 mod Patch: https://git.openjdk.org/jdk/pull/25138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25138/head:pull/25138 PR: https://git.openjdk.org/jdk/pull/25138 From xgong at openjdk.org Wed Jun 25 09:16:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 09:16:48 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Hi the above counted loop recognizer patch is merged. Hence I'v rebased this PR to latest jdk master. Following is the new performance data of the subword gather JMHs on X86: Benchmark SIZE Mode Cnt Unit Before After Gain GatherOperationsBenchmark.microByteGather128 64 thrpt 30 ops/ms 44221.691 46837.124 1.05 GatherOperationsBenchmark.microByteGather128 256 thrpt 30 ops/ms 11245.455 12243.045 1.08 GatherOperationsBenchmark.microByteGather128 1024 thrpt 30 ops/ms 2825.246 3096.460 1.09 GatherOperationsBenchmark.microByteGather128 4096 thrpt 30 ops/ms 705.927 775.039 1.09 GatherOperationsBenchmark.microByteGather128_MASK 64 thrpt 30 ops/ms 46783.479 46357.684 0.99 GatherOperationsBenchmark.microByteGather128_MASK 256 thrpt 30 ops/ms 12810.405 12880.347 1.00 GatherOperationsBenchmark.microByteGather128_MASK 1024 thrpt 30 ops/ms 3150.320 3239.281 1.02 GatherOperationsBenchmark.microByteGather128_MASK 4096 thrpt 30 ops/ms 794.151 830.464 1.04 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF 64 thrpt 30 ops/ms 43189.395 47127.449 1.09 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF 256 thrpt 30 ops/ms 11543.128 13196.158 1.14 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF 1024 thrpt 30 ops/ms 2835.053 3300.357 1.16 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF 4096 thrpt 30 ops/ms 719.470 843.290 1.17 GatherOperationsBenchmark.microByteGather128_NZ_OFF 64 thrpt 30 ops/ms 44143.887 46836.788 1.06 GatherOperationsBenchmark.microByteGather128_NZ_OFF 256 thrpt 30 ops/ms 12206.908 12255.677 1.00 GatherOperationsBenchmark.microByteGather128_NZ_OFF 1024 thrpt 30 ops/ms 3094.232 3095.931 1.00 GatherOperationsBenchmark.microByteGather128_NZ_OFF 4096 thrpt 30 ops/ms 776.293 774.336 0.99 GatherOperationsBenchmark.microByteGather256 64 thrpt 30 ops/ms 46247.977 46803.899 1.01 GatherOperationsBenchmark.microByteGather256 256 thrpt 30 ops/ms 12198.878 12250.315 1.00 GatherOperationsBenchmark.microByteGather256 1024 thrpt 30 ops/ms 3093.356 3100.107 1.00 GatherOperationsBenchmark.microByteGather256 4096 thrpt 30 ops/ms 774.611 774.890 1.00 GatherOperationsBenchmark.microByteGather256_MASK 64 thrpt 30 ops/ms 46873.725 47967.422 1.02 GatherOperationsBenchmark.microByteGather256_MASK 256 thrpt 30 ops/ms 13025.578 13481.477 1.03 GatherOperationsBenchmark.microByteGather256_MASK 1024 thrpt 30 ops/ms 3317.651 3396.208 1.02 GatherOperationsBenchmark.microByteGather256_MASK 4096 thrpt 30 ops/ms 846.0888 864.8407 1.02 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF 64 thrpt 30 ops/ms 44488.365 48769.036 1.09 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF 256 thrpt 30 ops/ms 11988.552 13326.306 1.11 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF 1024 thrpt 30 ops/ms 2851.132 3377.599 1.18 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF 4096 thrpt 30 ops/ms 734.368 872.331 1.18 GatherOperationsBenchmark.microByteGather256_NZ_OFF 64 thrpt 30 ops/ms 44716.846 46816.743 1.04 GatherOperationsBenchmark.microByteGather256_NZ_OFF 256 thrpt 30 ops/ms 11885.251 12255.916 1.03 GatherOperationsBenchmark.microByteGather256_NZ_OFF 1024 thrpt 30 ops/ms 3016.645 3096.172 1.02 GatherOperationsBenchmark.microByteGather256_NZ_OFF 4096 thrpt 30 ops/ms 756.903 776.363 1.02 GatherOperationsBenchmark.microByteGather512 64 thrpt 30 ops/ms 44742.221 46848.590 1.04 GatherOperationsBenchmark.microByteGather512 256 thrpt 30 ops/ms 12081.443 12236.973 1.01 GatherOperationsBenchmark.microByteGather512 1024 thrpt 30 ops/ms 3086.873 3088.040 1.00 GatherOperationsBenchmark.microByteGather512 4096 thrpt 30 ops/ms 774.243 770.209 0.99 GatherOperationsBenchmark.microByteGather512_MASK 64 thrpt 30 ops/ms 50588.210 48220.741 0.95 GatherOperationsBenchmark.microByteGather512_MASK 256 thrpt 30 ops/ms 13535.785 13675.499 1.01 GatherOperationsBenchmark.microByteGather512_MASK 1024 thrpt 30 ops/ms 3355.724 3421.323 1.01 GatherOperationsBenchmark.microByteGather512_MASK 4096 thrpt 30 ops/ms 859.103 872.009 1.01 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF 64 thrpt 30 ops/ms 44139.269 48320.364 1.09 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF 256 thrpt 30 ops/ms 12500.697 13801.124 1.10 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF 1024 thrpt 30 ops/ms 3135.082 3492.312 1.11 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF 4096 thrpt 30 ops/ms 794.338 897.249 1.12 GatherOperationsBenchmark.microByteGather512_NZ_OFF 64 thrpt 30 ops/ms 45754.147 46421.300 1.01 GatherOperationsBenchmark.microByteGather512_NZ_OFF 256 thrpt 30 ops/ms 12133.467 12253.848 1.00 GatherOperationsBenchmark.microByteGather512_NZ_OFF 1024 thrpt 30 ops/ms 3074.637 3091.207 1.00 GatherOperationsBenchmark.microByteGather512_NZ_OFF 4096 thrpt 30 ops/ms 755.250 774.367 1.02 GatherOperationsBenchmark.microByteGather64 64 thrpt 30 ops/ms 58625.196 59263.141 1.01 GatherOperationsBenchmark.microByteGather64 256 thrpt 30 ops/ms 15745.329 17377.889 1.10 GatherOperationsBenchmark.microByteGather64 1024 thrpt 30 ops/ms 4121.997 4471.261 1.08 GatherOperationsBenchmark.microByteGather64 4096 thrpt 30 ops/ms 1044.419 1125.721 1.07 GatherOperationsBenchmark.microByteGather64_MASK 64 thrpt 30 ops/ms 48754.131 49028.183 1.00 GatherOperationsBenchmark.microByteGather64_MASK 256 thrpt 30 ops/ms 13248.349 13537.811 1.02 GatherOperationsBenchmark.microByteGather64_MASK 1024 thrpt 30 ops/ms 3308.839 3356.109 1.01 GatherOperationsBenchmark.microByteGather64_MASK 4096 thrpt 30 ops/ms 843.688 859.161 1.01 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF 64 thrpt 30 ops/ms 43523.662 48868.373 1.12 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF 256 thrpt 30 ops/ms 12242.984 13519.719 1.10 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF 1024 thrpt 30 ops/ms 3055.772 3394.342 1.11 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF 4096 thrpt 30 ops/ms 754.532 870.302 1.15 GatherOperationsBenchmark.microByteGather64_NZ_OFF 64 thrpt 30 ops/ms 51858.935 58869.325 1.13 GatherOperationsBenchmark.microByteGather64_NZ_OFF 256 thrpt 30 ops/ms 14235.928 17381.117 1.22 GatherOperationsBenchmark.microByteGather64_NZ_OFF 1024 thrpt 30 ops/ms 3684.506 4483.270 1.21 GatherOperationsBenchmark.microByteGather64_NZ_OFF 4096 thrpt 30 ops/ms 922.368 1127.66 1.22 GatherOperationsBenchmark.microShortGather128 64 thrpt 30 ops/ms 44399.870 45016.972 1.01 GatherOperationsBenchmark.microShortGather128 256 thrpt 30 ops/ms 11679.775 12629.207 1.08 GatherOperationsBenchmark.microShortGather128 1024 thrpt 30 ops/ms 1277.328 3206.762 2.51 GatherOperationsBenchmark.microShortGather128 4096 thrpt 30 ops/ms 761.846 817.159 1.07 GatherOperationsBenchmark.microShortGather128_MASK 64 thrpt 30 ops/ms 37165.399 36484.534 0.98 GatherOperationsBenchmark.microShortGather128_MASK 256 thrpt 30 ops/ms 9875.757 9958.754 1.00 GatherOperationsBenchmark.microShortGather128_MASK 1024 thrpt 30 ops/ms 2519.580 2554.210 1.01 GatherOperationsBenchmark.microShortGather128_MASK 4096 thrpt 30 ops/ms 615.867 652.092 1.05 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF 64 thrpt 30 ops/ms 34049.203 33669.772 0.98 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF 256 thrpt 30 ops/ms 9010.587 8779.455 0.97 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF 1024 thrpt 30 ops/ms 2253.432 2415.560 1.07 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF 4096 thrpt 30 ops/ms 559.163 577.659 1.03 GatherOperationsBenchmark.microShortGather128_NZ_OFF 64 thrpt 30 ops/ms 39892.023 43978.899 1.10 GatherOperationsBenchmark.microShortGather128_NZ_OFF 256 thrpt 30 ops/ms 10697.817 12424.189 1.16 GatherOperationsBenchmark.microShortGather128_NZ_OFF 1024 thrpt 30 ops/ms 2681.286 3145.941 1.17 GatherOperationsBenchmark.microShortGather128_NZ_OFF 4096 thrpt 30 ops/ms 682.330 803.364 1.17 GatherOperationsBenchmark.microShortGather256 64 thrpt 30 ops/ms 42335.033 43194.212 1.02 GatherOperationsBenchmark.microShortGather256 256 thrpt 30 ops/ms 10760.015 11149.020 1.03 GatherOperationsBenchmark.microShortGather256 1024 thrpt 30 ops/ms 2688.410 2806.389 1.04 GatherOperationsBenchmark.microShortGather256 4096 thrpt 30 ops/ms 675.401 703.849 1.04 GatherOperationsBenchmark.microShortGather256_MASK 64 thrpt 30 ops/ms 38760.990 41844.197 1.07 GatherOperationsBenchmark.microShortGather256_MASK 256 thrpt 30 ops/ms 11339.217 10951.141 0.96 GatherOperationsBenchmark.microShortGather256_MASK 1024 thrpt 30 ops/ms 2840.081 2718.823 0.95 GatherOperationsBenchmark.microShortGather256_MASK 4096 thrpt 30 ops/ms 725.334 696.343 0.96 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF 64 thrpt 30 ops/ms 39059.271 42199.055 1.08 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF 256 thrpt 30 ops/ms 10440.036 11467.941 1.09 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF 1024 thrpt 30 ops/ms 2563.378 2790.541 1.08 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF 4096 thrpt 30 ops/ms 642.642 751.287 1.16 GatherOperationsBenchmark.microShortGather256_NZ_OFF 64 thrpt 30 ops/ms 38963.881 42675.099 1.09 GatherOperationsBenchmark.microShortGather256_NZ_OFF 256 thrpt 30 ops/ms 10628.469 11168.949 1.05 GatherOperationsBenchmark.microShortGather256_NZ_OFF 1024 thrpt 30 ops/ms 2702.591 2806.074 1.03 GatherOperationsBenchmark.microShortGather256_NZ_OFF 4096 thrpt 30 ops/ms 683.690 704.498 1.03 GatherOperationsBenchmark.microShortGather512 64 thrpt 30 ops/ms 41117.094 41269.397 1.00 GatherOperationsBenchmark.microShortGather512 256 thrpt 30 ops/ms 10565.519 10652.618 1.00 GatherOperationsBenchmark.microShortGather512 1024 thrpt 30 ops/ms 2681.894 2705.963 1.00 GatherOperationsBenchmark.microShortGather512 4096 thrpt 30 ops/ms 673.821 679.631 1.00 GatherOperationsBenchmark.microShortGather512_MASK 64 thrpt 30 ops/ms 41318.510 42372.271 1.02 GatherOperationsBenchmark.microShortGather512_MASK 256 thrpt 30 ops/ms 11587.465 10674.598 0.92 GatherOperationsBenchmark.microShortGather512_MASK 1024 thrpt 30 ops/ms 2902.731 2629.739 0.90 GatherOperationsBenchmark.microShortGather512_MASK 4096 thrpt 30 ops/ms 741.546 671.124 0.90 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF 64 thrpt 30 ops/ms 39524.127 40623.622 1.02 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF 256 thrpt 30 ops/ms 10642.152 11392.025 1.07 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF 1024 thrpt 30 ops/ms 2650.143 2819.185 1.06 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF 4096 thrpt 30 ops/ms 672.674 739.882 1.09 GatherOperationsBenchmark.microShortGather512_NZ_OFF 64 thrpt 30 ops/ms 39861.745 41600.729 1.04 GatherOperationsBenchmark.microShortGather512_NZ_OFF 256 thrpt 30 ops/ms 10531.312 10586.255 1.00 GatherOperationsBenchmark.microShortGather512_NZ_OFF 1024 thrpt 30 ops/ms 2667.839 2678.026 1.00 GatherOperationsBenchmark.microShortGather512_NZ_OFF 4096 thrpt 30 ops/ms 667.607 677.434 1.01 GatherOperationsBenchmark.microShortGather64 64 thrpt 30 ops/ms 45716.109 50726.590 1.10 GatherOperationsBenchmark.microShortGather64 256 thrpt 30 ops/ms 12383.842 13608.216 1.09 GatherOperationsBenchmark.microShortGather64 1024 thrpt 30 ops/ms 3025.989 3443.097 1.13 GatherOperationsBenchmark.microShortGather64 4096 thrpt 30 ops/ms 771.995 897.890 1.16 GatherOperationsBenchmark.microShortGather64_MASK 64 thrpt 30 ops/ms 39758.975 39155.984 0.98 GatherOperationsBenchmark.microShortGather64_MASK 256 thrpt 30 ops/ms 10594.260 10622.428 1.00 GatherOperationsBenchmark.microShortGather64_MASK 1024 thrpt 30 ops/ms 2654.849 2771.674 1.04 GatherOperationsBenchmark.microShortGather64_MASK 4096 thrpt 30 ops/ms 677.508 684.557 1.01 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF 64 thrpt 30 ops/ms 37729.191 40552.172 1.07 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF 256 thrpt 30 ops/ms 10087.184 11121.611 1.10 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF 1024 thrpt 30 ops/ms 2510.133 2788.778 1.11 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF 4096 thrpt 30 ops/ms 642.370 658.808 1.02 GatherOperationsBenchmark.microShortGather64_NZ_OFF 64 thrpt 30 ops/ms 40632.099 50718.706 1.24 GatherOperationsBenchmark.microShortGather64_NZ_OFF 256 thrpt 30 ops/ms 10984.671 14155.624 1.28 GatherOperationsBenchmark.microShortGather64_NZ_OFF 1024 thrpt 30 ops/ms 2733.285 3668.118 1.34 GatherOperationsBenchmark.microShortGather64_NZ_OFF 4096 thrpt 30 ops/ms 679.524 932.748 1.37 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3004026787 From xgong at openjdk.org Wed Jun 25 09:16:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Jun 2025 09:16:48 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: <-_jqrYt3RDwwrdFt12v0cv8yefopGBAKLjUg8B6lBTM=.e60b57b8-c867-43a1-a793-093730810b3d@github.com> On Mon, 2 Jun 2025 10:48:25 GMT, Emanuel Peter wrote: >>> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> > >>> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >>> >>> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! > > @XiaohongGong I reviewed https://github.com/openjdk/jdk/pull/25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? Hi @eme64 I'v updated the patch to fix the comment issue you pointed above. Could you please help take a look at again? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-3004029058 From shade at openjdk.org Wed Jun 25 09:40:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Jun 2025 09:40:29 GMT Subject: [jdk25] RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:54:35 GMT, Igor Veresov wrote: > 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded Indeed, and this is what we do in other places as well. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25969#pullrequestreview-2957444926 From dlunden at openjdk.org Wed Jun 25 09:51:30 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 25 Jun 2025 09:51:30 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: <-kw-iaB6kpoJ5H8ukHUZ7eoVWAt7tLOlPAjn0mVPTkk=.61d7bc34-efed-4e5a-a816-f137d6cee19b@github.com> On Tue, 24 Jun 2025 16:32:11 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > merge with master > Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2957477670 From syan at openjdk.org Wed Jun 25 09:51:31 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 25 Jun 2025 09:51:31 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v5] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Mon, 23 Jun 2025 06:58:08 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Fix merge conflict resolution > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - VMPageSizeConstraintFunc > - ... and 6 more: https://git.openjdk.org/jdk/compare/de34bb8e...689b4ea8 I think this PR ready to integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-3004142001 From dlunden at openjdk.org Wed Jun 25 09:51:31 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 25 Jun 2025 09:51:31 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v2] In-Reply-To: References: <_jMbdLzfV1MOFrUJH7J6-zXWLabQTOTsfb2hWvEL3Kc=.fede59e2-dd81-4128-9b7a-b8c47334a062@github.com> Message-ID: On Wed, 25 Jun 2025 08:44:53 GMT, Saranya Natarajan wrote: >> Could you elaborate a bit on why this is the case? Just looking briefly at `OuterStripMinedLoopNode::adjust_strip_mined_loop` (called from `refine_strip_mined_loop_macro_nodes`), I'm not convinced there are no other interactions. > > Before https://github.com/openjdk/jdk/pull/24890, `OuterStripMinedLoopNode::adjust_strip_mined_loop` was called just before `C->remove_macro_node(n) `inside the condition `n->Opcode() == Op_OuterStripMinedLoop` [(line 2523)](https://github.com/openjdk/jdk/pull/25682/commits/f42ec1a27ad1767580ef4d480ded846a5ea9fc6a#diff-2faebd05d08f9115f8d9ef771644cf05087a6986c2f9013d7163c6aa720169c3R2523). This lead to assert failure as ` OuterStripMinedLoopNode::adjust_strip_mined_loop` added a new node to the macro list. The conclusion from https://github.com/openjdk/jdk/pull/24890 was that ` OuterStripMinedLoopNode::adjust_strip_mined_loop` should be called before we start the macro elimination and remove macro nodes with Opcode `Op_OuterStripMinedLoop`. This is my reasoning for its current placement to be okay. OK, good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2166304222 From shade at openjdk.org Wed Jun 25 09:55:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Jun 2025 09:55:30 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v5] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Mon, 23 Jun 2025 06:58:08 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Fix merge conflict resolution > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - VMPageSizeConstraintFunc > - ... and 6 more: https://git.openjdk.org/jdk/compare/de34bb8e...689b4ea8 Marked as reviewed by shade (Reviewer). If you merge from current master, windows builds are going to be fixed as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/25872#pullrequestreview-2957489922 PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-3004154002 From aph at openjdk.org Wed Jun 25 09:58:31 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 25 Jun 2025 09:58:31 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:07:31 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 255: > >> 253: // the default VectorRearrange + VectorBlend is generated as the performance of the default >> 254: // implementation was slightly better/similar than the implementaion for SelectFromTwoVector. >> 255: // As the SVE2 "tbl" instruction in unpredicated and partial operations cannot be generated > > `in` -> `is` "as" -> "because". Consider the difference between "as I was walking home, the Perseid meteor shower produced a spectacular display" and "because I was walking home, the Perseid meteor shower produced a spectacular display". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166307149 From aph at openjdk.org Wed Jun 25 09:58:33 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 25 Jun 2025 09:58:33 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v5] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 13:36:14 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Revert a small change in c2_MacroAssembler.hpp src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5288: > 5286: %} > 5287: ins_pipe(pipe_slow); > 5288: %} You should use the macro processor here, not cut and paste. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 4231: > 4229: > 4230: // SVE/SVE2 Programmable table lookup in one or two vector table (zeroing) > 4231: void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, unsigned registers, FloatRegister Zm) { Suggestion: void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, unsigned reg_count, FloatRegister Zm) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166310692 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166319212 From bkilambi at openjdk.org Wed Jun 25 09:58:33 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 25 Jun 2025 09:58:33 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v5] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 09:51:19 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert a small change in c2_MacroAssembler.hpp > > src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5288: > >> 5286: %} >> 5287: ins_pipe(pipe_slow); >> 5288: %} > > You should use the macro processor here, not cut and paste. Oh right! apologies should have thought of it. I'll update this. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166315759 From bkilambi at openjdk.org Wed Jun 25 10:05:40 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 25 Jun 2025 10:05:40 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: <8a0CrcYjL8j1IWWhnPjKzgZWrzhOsjOLL1NcOCSEoWE=.2b574e39-7783-45e6-b7c5-0c95aa3ec75c@github.com> On Wed, 25 Jun 2025 09:49:34 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 255: >> >>> 253: // the default VectorRearrange + VectorBlend is generated as the performance of the default >>> 254: // implementation was slightly better/similar than the implementaion for SelectFromTwoVector. >>> 255: // As the SVE2 "tbl" instruction in unpredicated and partial operations cannot be generated >> >> `in` -> `is` > > "as" -> "because". > Consider the difference between "as I was walking home, the Perseid meteor shower produced a spectacular display" and "because I was walking home, the Perseid meteor shower produced a spectacular display". Got it. thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2166334091 From duke at openjdk.org Wed Jun 25 10:08:23 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:08:23 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Address more comments ATT. - Merge branch 'master' into JDK-8354242 - Support negating unsigned comparison for BoolTest::mask Added a static method `negate_mask(mask btm)` into BoolTest class to negate both signed and unsigned comparison. - Addressed some review comments - Merge branch 'master' into JDK-8354242 - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - ... and 5 more: https://git.openjdk.org/jdk/compare/c8050dff...5ebdc572 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/f51bf722..5ebdc572 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=07-08 Stats: 36932 lines in 979 files changed: 21941 ins; 10041 del; 4950 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From duke at openjdk.org Wed Jun 25 10:16:32 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:16:32 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v7] In-Reply-To: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> References: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com> Message-ID: On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter wrote: >>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { return mask(_test^4); }` I think you should use that instead :) >>> >>> Indeed, I hadn't noticed that, thank you. >> >> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. > >> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function. > > I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it. > > Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported. @eme64 @XiaohongGong Your comment has been addressed, thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3004213740 From duke at openjdk.org Wed Jun 25 10:16:36 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:16:36 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 04:56:36 GMT, Emanuel Peter wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/subnode.hpp line 333: > >> 331: mask negate( ) const { return mask(_test^4); } >> 332: // Return the negative mask for the given mask, for both signed and unsigned comparison. >> 333: static mask negate_mask(mask btm) { return mask(btm^4); } > > Suggestion: > > static mask negate_mask(mask btm) { return mask(btm ^ 4); } > > > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > >> Use spaces around operators, especially comparisons and assignments. (Relaxable for boolean expressions and high-precedence operators in classic math-style formulas.) Done, thanks! > src/hotspot/share/opto/vectornode.cpp line 2226: > >> 2224: >> 2225: const TypeVect* vector_mask_cast_vt = nullptr; >> 2226: // in1 should be single used, otherwise the optimization may be unprofitable. > > Suggestion: > > // in1 should only have a single use, otherwise the optimization may be unprofitable. Done > src/hotspot/share/opto/vectornode.cpp line 2237: > >> 2235: !VectorNode::is_all_ones_vector(in2)) { >> 2236: return nullptr; >> 2237: } > > Similarly here: do you have tests for these conditions, that we do not optimize if any of these fail? Added some negative tests for these conditions > src/hotspot/share/opto/vectornode.cpp line 2239: > >> 2237: } >> 2238: >> 2239: BoolTest::mask neg_cond = BoolTest::negate_mask(((VectorMaskCmpNode*) in1)->get_predicate()); > > Suggestion: > > BoolTest::mask neg_cond = BoolTest::negate_mask((in1->as_VectorMaskCmp())->get_predicate()); > > Does that compile? It would be prefereable. Yes, done, thanks! > src/hotspot/share/opto/vectornode.cpp line 2243: > >> 2241: const TypeVect* vt = in1->as_Vector()->vect_type(); >> 2242: Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2), >> 2243: predicate_node, vt); > > Suggestion: > > Node* res = new VectorMaskCmpNode(neg_cond, in1->in(1), in1->in(2), > predicate_node, vt); > > Alignment Done > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 244: > >> 242: testCompareMaskNotByte(VectorOperators.EQ, m -> m.not()); >> 243: testCompareMaskNotByte(VectorOperators.EQ, m -> m.xor(B_SPECIES.maskAll(true))); >> 244: } > > Could it happen that the verification is inlined in the test body? > > Currently, the verification is probably inlined, but the code there is not vectorized. But what if one day the auto-vectorizer is smart enough and vectorizes it, and creates vectors that we currently check `count ...= 0`? > > At least, you could ensure that the verification does not get inlined, with `@DontInline`. > > What do you think? Make sense, done. > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 692: > >> 690: TestFramework testFramework = new TestFramework(); >> 691: testFramework.addFlags("--add-modules=jdk.incubator.vector"); >> 692: testFramework.setDefaultWarmup(10000); > > The default is `2000` is that not enough? > > Increasing it means the test runs slower, here probably about 5x. Yes, not enough, changed to 5000. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166343933 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166344643 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346170 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346620 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166346885 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166347867 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166352227 From duke at openjdk.org Wed Jun 25 10:16:38 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:16:38 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 09:09:57 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2221: >> >>> 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. >>> 2220: if (in2->Opcode() == Op_VectorMaskCmp || >>> 2221: (in2->Opcode() == Op_VectorMaskCast && in2->in(1)->Opcode() == Op_VectorMaskCmp)) { >> >> We may need to consider cases that a `VectorMaskCast` is generated between `compare + not`, such as `compare + cast + not`. For such cases, the element size maybe different for input and output of a `cast`. Although this patch's intention is not for the latter pattern, current change have also covered it well. Could you please add more test/jmh for all kinds of `cast` pattern here? And I think the scope of this PR could be also extended to `compare + cast + not`. WDYT? > > Good catch, I'll add more tests and check the correctness. Thanks~ Added more tests for this, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166350143 From duke at openjdk.org Wed Jun 25 10:16:39 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:16:39 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 05:09:34 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2227: >> >>> 2225: const TypeVect* vector_mask_cast_vt = nullptr; >>> 2226: // in1 should be single used, otherwise the optimization may be unprofitable. >>> 2227: if (in1->Opcode() == Op_VectorMaskCast && in1->outcnt() == 1 && in1->in(1)->Opcode() == Op_VectorMaskCmp) { >> >> `in1->in(1)->Opcode() == Op_VectorMaskCmp` >> Is this check here even necessary? Because we check it below again, right? >> `in1->Opcode() != Op_VectorMaskCmp` > > Btw: do you have a test where `in1->outcnt() > 1`, and you check that the optimization does not happen with an IR test? Refactored the code, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166345311 From duke at openjdk.org Wed Jun 25 10:16:40 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:16:40 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: References: Message-ID: On Wed, 11 Jun 2025 08:51:50 GMT, Xiaohong Gong wrote: >> erifan has updated the pull request incrementally with one additional commit since the last revision: >> >> Support negating unsigned comparison for BoolTest::mask >> >> Added a static method `negate_mask(mask btm)` into BoolTest class to >> negate both signed and unsigned comparison. > > src/hotspot/share/opto/vectornode.cpp line 2237: > >> 2235: !VectorNode::is_all_ones_vector(in2)) { >> 2236: return nullptr; >> 2237: } > > This part can be refined more clearly: > > // Swap and put all_ones_vector to right > if (!VectorNode::is_all_ones_vector(in1)) { > swap(in1, in2); > } > > // uncast mask > bool need_cast = false; > if (in1->Opcode() == Op_VectorMaskCast && > in1->outcnt() == 1) { > assert(in1->bottom_type()->eq(bottom_type()), ""); > need_cast = true; > in1 = in1->in(1); > } > > // Check mask cmp pattern > if (in1->Opcode() != Op_VectorMaskCmp || > in1->outcnt() > 1 || > !in1->as_VectorMaskCmp()->predicate_can_be_negated()) { > return nullptr; > } > > // Convert VectorMaskCmp + not > > > // Cast back > if (need_cast) { > res = new VectorMaskCastNode(phase->transform(res), vect_type()); > } Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166351241 From duke at openjdk.org Wed Jun 25 10:16:40 2025 From: duke at openjdk.org (erifan) Date: Wed, 25 Jun 2025 10:16:40 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v8] In-Reply-To: <-Njau47iLgFUOIQEnZDrkzZKIjPEk3ErFIenrs6AelM=.2624ea9a-6b6c-4a07-85e4-2fa7334754dd@github.com> References: <-Njau47iLgFUOIQEnZDrkzZKIjPEk3ErFIenrs6AelM=.2624ea9a-6b6c-4a07-85e4-2fa7334754dd@github.com> Message-ID: On Wed, 11 Jun 2025 09:07:12 GMT, erifan wrote: >> Oh. Ok. Well at least add a `RuntimeException` to an `else` branch then, I would suggest :) > > Make sense! Done >>> > You are checking IRNode.XOR_VL, "= 0". But you are comparing floats. Does that make sense? >> >>> The bottom types of float and double vector masks are casted to int and long. Seems this is by design? So this is correct. >> >> This is a `float` test. What is the bottom type for the mask here? > > Oh, this is a stupid copy-paste mistake. Good catch, thanks! I'll double check them all. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166347264 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2166349306 From mhaessig at openjdk.org Wed Jun 25 10:33:12 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 25 Jun 2025 10:33:12 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v6] In-Reply-To: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: > Running > > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > > on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. > > This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) > - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into JDK-8354727-policy - Merge branch 'master' into JDK-8354727-policy - Fix merge conflict resolution - Merge branch 'master' into JDK-8354727-policy - Calculate buffer size correctly for c2_only Co-authored-by: Aleksey Shipilev - Caclulate how many compiler buffers fit into NonNMethodCodeHeap - Clarify endif - update copyrights - remove leftover include - fix whitebox access to code cache size configs - ... and 7 more: https://git.openjdk.org/jdk/compare/5c4f92ba...569229cb ------------- Changes: https://git.openjdk.org/jdk/pull/25872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25872&range=05 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25872/head:pull/25872 PR: https://git.openjdk.org/jdk/pull/25872 From jbhateja at openjdk.org Wed Jun 25 10:35:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 25 Jun 2025 10:35:08 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction Message-ID: Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. **The following pseudo-code describes the existing algorithm for min/max[FD]:** Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. btmp = (b < +0.0) ? a : b atmp = (b < +0.0) ? b : a Tmp = Max_Float(atmp , btmp) Res = (atmp == NaN) ? atmp : Tmp For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. btmp = (b < +0.0) ? b : a atmp = (b < +0.0) ? a : b Tmp = Max_Float(atmp , btmp) Res = (atmp == NaN) ? atmp : Tmp Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. Kindly review and share your feedback. Best Regards, Jatin [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 ------------- Commit messages: - Extending the patch to cover reduction operations - 8360116: Add support for AVX10 floating point minmax instruction Changes: https://git.openjdk.org/jdk/pull/25914/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360116 Stats: 420 lines in 7 files changed: 379 ins; 4 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/25914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914 PR: https://git.openjdk.org/jdk/pull/25914 From jbhateja at openjdk.org Wed Jun 25 10:41:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 25 Jun 2025 10:41:04 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v2] In-Reply-To: References: Message-ID: > Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. > > **The following pseudo-code describes the existing algorithm for min/max[FD]:** > > Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. > > btmp = (b < +0.0) ? a : b > atmp = (b < +0.0) ? b : a > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. > > btmp = (b < +0.0) ? b : a > atmp = (b < +0.0) ? a : b > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. > > Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25914/files - new: https://git.openjdk.org/jdk/pull/25914/files/e7753571..b6e55157 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914 PR: https://git.openjdk.org/jdk/pull/25914 From dlunden at openjdk.org Wed Jun 25 12:15:47 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 25 Jun 2025 12:15:47 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:56:53 GMT, Daniel Lund?n wrote: >> @dlunde Again: thanks for working on this! It looks like a lot of work, and the existing code was not exactly in the best stlye already ? So don't get discouraged by my many comments, a lot of them are small things anyway, and many are just nits. > > @eme64 Thanks for the comments, I'll start addressing them soon! I'm certainly not discouraged (rather the opposite), keep the comments coming :slightly_smiling_face: >@dlunde I responded to a few more issues above at my previous comments. I have not yet looked at the code itself again. I can do that once we have discussed the current topics :) Thank @eme64! Sure, sounds good. > But then: why not just increase the test timeout? I have now reinvestigated (it's been a while) the `java/lang/invoke` tests, and increasing the timeout instead of limiting the number of nodes is indeed a viable (and, I agree, cleaner) option. To investigate, I reset `BigArityTest.java`, `TestCatchExceptionWithVarargs.java`, and `VarargsArrayTest.java` to their current mainline state, and reran testing with many different flag combinations. - The main issue I encountered earlier on with `BigArityTest.java` was memory consumption, but since I published this PR a `-XX:CompileCommand=memlimit,*.*,0` has been added to the `@run`. Therefore, the current mainline version of this test (with no reduced `MaxNodeLimit`) now passes after this changeset, although it still takes significantly longer to run with `-Xcomp` compared to mainline (due to the additional enabled compilations). `BigArityTest.java` also already has a (very) generous timeout, so it seems expected that this test will take a long time to run. - `TestCatchExceptionWithVarargs.java` no longer times out. I'm not really sure what changed, but it's now fine to remove the reduced `MaxNodeLimit`. The test still takes longer to run with `-Xcomp` compared to mainline, so we still do the additional compilations. - `VarargsArrayTest.java` still times out, but setting a more generous timeout works fine. Again, it takes significantly longer to run with `-Xcomp` compared to mainline. > To me, this looks like a possible cause for (compile time) regressions. Imagine, someone has such a method in the startup/warmup of their application. And now this change all of the sudden delays compilation by 40seconds. That would be quite bad! > > Hence, I wonder if we should not already investigate this now, so we are a bit more sure we do not see 40sec compilations in the wild. Yes, I agree that would be unfortunate. The motivation for investigating further in a follow-up issue is that these tests are artificial and do not really occur in practice. We have run extensive performance testing to justify this. While I would prefer to investigate in a follow-up issue, I don't strongly object to investigate as part of this issue. As I mentioned in my response to @robcasloz earlier, I think what we need is to add bailouts related to interference graph size during register allocation. Permitting compilation with many arguments exposes cases where just bailing out on node count (as we do now) is not sufficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3004549609 From fjiang at openjdk.org Wed Jun 25 12:44:37 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 25 Jun 2025 12:44:37 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 Message-ID: Hi, please consider. [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. This may lead to incorrect selection of the arraycopy function when `unaligned` flag for arraycopy is set. We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. This pr adds additional checks for unaligned case on RISC-V to ensure the arraycopy function is selected correctly. JMH data on P550 SBC for reference (w/o and w/ the patch): Before: Without COH: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op ------------------------------------------------------------------------- With COH: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 50.667 ? 0.622 ns/op ArrayClone.byteArraycopy 10 avgt 15 76.917 ? 0.914 ns/op ArrayClone.byteArraycopy 100 avgt 15 82.928 ? 0.056 ns/op ArrayClone.byteArraycopy 1000 avgt 15 485.806 ? 0.653 ns/op ArrayClone.byteClone 0 avgt 15 90.417 ? 1.059 ns/op ArrayClone.byteClone 10 avgt 15 1634.691 ? 9.870 ns/op ArrayClone.byteClone 100 avgt 15 18637.149 ? 30.985 ns/op ArrayClone.byteClone 1000 avgt 15 193437.253 ? 435.771 ns/op ArrayClone.intArraycopy 0 avgt 15 50.475 ? 0.545 ns/op ArrayClone.intArraycopy 10 avgt 15 77.515 ? 0.958 ns/op ArrayClone.intArraycopy 100 avgt 15 264.586 ? 0.237 ns/op ArrayClone.intArraycopy 1000 avgt 15 1160.459 ? 1.394 ns/op ArrayClone.intClone 0 avgt 15 90.776 ? 0.309 ns/op ArrayClone.intClone 10 avgt 15 7794.589 ? 13.752 ns/op ArrayClone.intClone 100 avgt 15 77303.097 ? 154.991 ns/op ArrayClone.intClone 1000 avgt 15 773291.729 ? 1505.788 ns/op After: Without COH: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 49.421 ? 0.588 ns/op ArrayClone.byteArraycopy 10 avgt 15 71.687 ? 0.828 ns/op ArrayClone.byteArraycopy 100 avgt 15 82.570 ? 0.068 ns/op ArrayClone.byteArraycopy 1000 avgt 15 478.411 ? 0.505 ns/op ArrayClone.byteClone 0 avgt 15 90.660 ? 0.314 ns/op ArrayClone.byteClone 10 avgt 15 131.243 ? 0.407 ns/op ArrayClone.byteClone 100 avgt 15 251.823 ? 0.192 ns/op ArrayClone.byteClone 1000 avgt 15 404.857 ? 1.985 ns/op ArrayClone.intArraycopy 0 avgt 15 49.672 ? 0.466 ns/op ArrayClone.intArraycopy 10 avgt 15 78.996 ? 1.522 ns/op ArrayClone.intArraycopy 100 avgt 15 263.690 ? 0.175 ns/op ArrayClone.intArraycopy 1000 avgt 15 1155.155 ? 2.549 ns/op ArrayClone.intClone 0 avgt 15 90.495 ? 0.296 ns/op ArrayClone.intClone 10 avgt 15 184.500 ? 0.554 ns/op ArrayClone.intClone 100 avgt 15 294.608 ? 0.139 ns/op ArrayClone.intClone 1000 avgt 15 817.005 ? 0.551 ns/op ------------------------------------------------------------------------- With COH: Benchmark (size) Mode Cnt Score Error Units ArrayClone.byteArraycopy 0 avgt 15 51.322 ? 0.519 ns/op ArrayClone.byteArraycopy 10 avgt 15 76.479 ? 0.679 ns/op ArrayClone.byteArraycopy 100 avgt 15 82.936 ? 0.060 ns/op ArrayClone.byteArraycopy 1000 avgt 15 487.030 ? 0.464 ns/op ArrayClone.byteClone 0 avgt 15 89.688 ? 0.276 ns/op ArrayClone.byteClone 10 avgt 15 109.446 ? 0.379 ns/op ArrayClone.byteClone 100 avgt 15 221.747 ? 0.176 ns/op ArrayClone.byteClone 1000 avgt 15 430.846 ? 0.370 ns/op ArrayClone.intArraycopy 0 avgt 15 50.534 ? 0.524 ns/op ArrayClone.intArraycopy 10 avgt 15 78.986 ? 1.341 ns/op ArrayClone.intArraycopy 100 avgt 15 263.473 ? 0.168 ns/op ArrayClone.intArraycopy 1000 avgt 15 1155.394 ? 1.396 ns/op ArrayClone.intClone 0 avgt 15 89.698 ? 0.217 ns/op ArrayClone.intClone 10 avgt 15 185.278 ? 0.673 ns/op ArrayClone.intClone 100 avgt 15 375.374 ? 0.200 ns/op ArrayClone.intClone 1000 avgt 15 872.398 ? 1.780 ns/op ------------- Commit messages: - riscv: fix c1 primitive array clone intrinsic regression Changes: https://git.openjdk.org/jdk/pull/25976/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360520 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25976/head:pull/25976 PR: https://git.openjdk.org/jdk/pull/25976 From mhaessig at openjdk.org Wed Jun 25 13:02:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 25 Jun 2025 13:02:45 GMT Subject: RFR: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce [v6] In-Reply-To: References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: <48C_9KCCZLE4iNvQqVdH9-4q9Eh7PlM9XKE06EtcoCI=.bb252706-55f9-40e1-8285-b7dac0a0e423@github.com> On Wed, 25 Jun 2025 10:33:12 GMT, Manuel H?ssig wrote: >> Running >> >> >> java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ >> -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version >> >> >> on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. >> >> # Changes >> >> This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. >> >> This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) >> - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354727-policy > - Merge branch 'master' into JDK-8354727-policy > - Fix merge conflict resolution > - Merge branch 'master' into JDK-8354727-policy > - Calculate buffer size correctly for c2_only > > Co-authored-by: Aleksey Shipilev > - Caclulate how many compiler buffers fit into NonNMethodCodeHeap > - Clarify endif > - update copyrights > - remove leftover include > - fix whitebox access to code cache size configs > - ... and 7 more: https://git.openjdk.org/jdk/compare/5c4f92ba...569229cb Thank you for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25872#issuecomment-3004683301 From mhaessig at openjdk.org Wed Jun 25 13:02:46 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 25 Jun 2025 13:02:46 GMT Subject: Integrated: 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce In-Reply-To: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> References: <29uhzjC6hIy_YrycL72lqIott9dPDIYrWibtvFjvVsg=.ef3b2c29-e848-4acd-a264-d18cec07bd44@github.com> Message-ID: On Wed, 18 Jun 2025 13:11:53 GMT, Manuel H?ssig wrote: > Running > > > java -XX:+SegmentedCodeCache -XX:ReservedCodeCacheSize=10M -XX:NonNMethodCodeHeapSize=6M \ > -XX:ProfiledCodeHeapSize=5M -XX:NonProfiledCodeHeapSize=5M -version > > > on a machine with more than 285 cores, this would fail with the message that the specified `NonNMethodCodeHeapSize` is too small to fit all compiler buffers (instead of failing because the sum of the heaps is larger than the `ReservedCodeCacheSize`). Hence, the calculated compiler count is too high. This is due to CompilationPolicy::initialize() checking how many compiler buffers fit into the `ReservedCodeCacheSize`. However, in the case above, this is significantly larger than `NonNMethodCodeHeapSize` (especially on a debug build) and causes a check changed in #17244 to fail. That check was changed to check that all compiler buffers fit into the `NonNMethodCodeHeap` instead of the `NonNMethodCodeHeap` having at least a size of `CodeCacheMinimumUseSpace`. > > # Changes > > This PR fixes the calculation of the `CICompilerCount` ergonomic. Firstly, @shipilev kindly provided a fix for the compiler buffer size used in the calculation is also correct if we only have C2. Secondly, `NonNMethodHeapSize` is used as the maximum buffer size available for compilers buffers in the calculation of the maximum number of compiler threads instead of `ReservedCodeCacheSize`. Therefore, the check failing in the explanation above can never fail because we set the number of compiler threads only so high that they will always fit into the `NonNMethodCodeHeap`. > > This change changes how many compiler threads are created by the `CICompilerCount` ergonomic. For the default value `NonNMethodCodeHeapSize=5M`this limit is 24 compiler threads on a system 285 cores or more for product builds and 20 threads for debug builds on a system with 145 cores or more. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15733154809) > - [x] tier1 through tier3 plus Oracle internal testing on our supported platforms This pull request has now been integrated. Changeset: f2ef8097 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/f2ef809719cbb14f90a0a5f673e10e7c74fa0f45 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod 8354727: CompilationPolicy creates too many compiler threads when code cache space is scarce Co-authored-by: Aleksey Shipilev Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/25872 From eastigeevich at openjdk.org Wed Jun 25 13:40:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 25 Jun 2025 13:40:10 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v3] In-Reply-To: References: Message-ID: > There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: > - https://github.com/mysql/mysql-server/pull/611 > - https://github.com/facebook/folly/pull/2390 > > There are discussions regarding using it for spin pauses: > - https://github.com/gperftools/gperftools/pull/1594 > - https://github.com/haproxy/haproxy/pull/2974 > > Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tests: > - Gtests passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. > > Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): > > > Benchmark Mode Cnt Score Error Units Diff > ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op > ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff > ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% > ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% > ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% > ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% > ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% > ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8359435 - Add SB detection - Add support for SB to MacroAssembler::spin_wait - 8359435: AArch64: add support for 8.5 SB instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25801/files - new: https://git.openjdk.org/jdk/pull/25801/files/a045194c..9b02a59c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=01-02 Stats: 14368 lines in 519 files changed: 6141 ins; 5231 del; 2996 mod Patch: https://git.openjdk.org/jdk/pull/25801.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25801/head:pull/25801 PR: https://git.openjdk.org/jdk/pull/25801 From eastigeevich at openjdk.org Wed Jun 25 13:47:33 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 25 Jun 2025 13:47:33 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 16:36:34 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add SB detection >> - Add support for SB to MacroAssembler::spin_wait > > test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 80: > >> 78: OutputAnalyzer analyzer = new OutputAnalyzer(pb.start()); >> 79: >> 80: if (analyzer.getExitValue() != 0 && "sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) { > > The logic here is a bit off. Suppose we _do_ have non-zero exit code for, say, `isb`. This would not fail the test now. Do it something like this instead? > > > if ("sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) { > System.out.println("Skipping the test. The current CPU does not support SB instruction."); > return; > } > > analyzer.shouldHaveExitValue(0); Thank you, Aleksey for finding this. I accidentally removed `analyzer.shouldHaveExitValue(0)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2166758608 From eastigeevich at openjdk.org Wed Jun 25 13:56:09 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 25 Jun 2025 13:56:09 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v4] In-Reply-To: References: Message-ID: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> > There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: > - https://github.com/mysql/mysql-server/pull/611 > - https://github.com/facebook/folly/pull/2390 > > There are discussions regarding using it for spin pauses: > - https://github.com/gperftools/gperftools/pull/1594 > - https://github.com/haproxy/haproxy/pull/2974 > > Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tests: > - Gtests passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. > > Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): > > > Benchmark Mode Cnt Score Error Units Diff > ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op > ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff > ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% > ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% > ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% > ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% > ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% > ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Restore check of non-zero exit value; make spinWaitInstCount always provided in test cmd options ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25801/files - new: https://git.openjdk.org/jdk/pull/25801/files/9b02a59c..ab8c7e6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25801&range=02-03 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25801.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25801/head:pull/25801 PR: https://git.openjdk.org/jdk/pull/25801 From eastigeevich at openjdk.org Wed Jun 25 13:56:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 25 Jun 2025 13:56:10 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 16:40:42 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add SB detection >> - Add support for SB to MacroAssembler::spin_wait > > test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 36: > >> 34: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 isb 3 >> 35: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 yield 1 >> 36: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 sb > > Since we are touching up the test: maybe just say `sb 1` explicitly, and then read `spinWaitInstCount` from `args[2]` unconditionally? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2166778973 From eastigeevich at openjdk.org Wed Jun 25 13:56:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 25 Jun 2025 13:56:10 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2] In-Reply-To: References: Message-ID: <2zOz95mkgaKJJYYuVcghSFIiUPGXjby8N8OBH3mjGm0=.658bf900-ea9d-49c9-a2d0-6b825a2320d6@github.com> On Wed, 25 Jun 2025 13:43:56 GMT, Evgeny Astigeevich wrote: >> test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 80: >> >>> 78: OutputAnalyzer analyzer = new OutputAnalyzer(pb.start()); >>> 79: >>> 80: if (analyzer.getExitValue() != 0 && "sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) { >> >> The logic here is a bit off. Suppose we _do_ have non-zero exit code for, say, `isb`. This would not fail the test now. Do it something like this instead? >> >> >> if ("sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) { >> System.out.println("Skipping the test. The current CPU does not support SB instruction."); >> return; >> } >> >> analyzer.shouldHaveExitValue(0); > > Thank you, Aleksey for finding this. I accidentally removed `analyzer.shouldHaveExitValue(0)` Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2166779245 From shade at openjdk.org Wed Jun 25 14:10:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Jun 2025 14:10:29 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v4] In-Reply-To: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> References: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> Message-ID: On Wed, 25 Jun 2025 13:56:09 GMT, Evgeny Astigeevich wrote: >> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: >> - https://github.com/mysql/mysql-server/pull/611 >> - https://github.com/facebook/folly/pull/2390 >> >> There are discussions regarding using it for spin pauses: >> - https://github.com/gperftools/gperftools/pull/1594 >> - https://github.com/haproxy/haproxy/pull/2974 >> >> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension >> >> CPUs supporting it: >> - Apple M2+ >> - Neoverse-N2 >> - Neoverse-V2 >> >> Tests: >> - Gtests passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. >> >> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): >> >> >> Benchmark Mode Cnt Score Error Units Diff >> ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op >> ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% >> >> Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff >> ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Restore check of non-zero exit value; make spinWaitInstCount always provided in test cmd options Looks okay to me. @theRealAph should also take a look. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25801#pullrequestreview-2958325428 From mhaessig at openjdk.org Wed Jun 25 15:45:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 25 Jun 2025 15:45:33 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v2] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 10:41:04 GMT, Jatin Bhateja wrote: >> Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. >> >> **The following pseudo-code describes the existing algorithm for min/max[FD]:** >> >> Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. >> >> btmp = (b < +0.0) ? a : b >> atmp = (b < +0.0) ? b : a >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. >> >> btmp = (b < +0.0) ? b : a >> atmp = (b < +0.0) ? a : b >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. >> >> Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update comments Thank you for implementing these new instructions! I had a look at your changes and have a few minor suggestions and questions. I am quite new to this part of the codebase, so feel free to disagree if I am way off base. How did you test these changes? Also, if you merge the current master branch, the Windows build failures in the Github Actions will be fixed. src/hotspot/cpu/x86/assembler_x86.cpp line 8693: > 8691: } > 8692: > 8693: Suggestion: Nit: superfluous empty line src/hotspot/cpu/x86/assembler_x86.cpp line 8785: > 8783: void Assembler::evminmaxps(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len) { > 8784: assert(VM_Version::supports_avx10_2(), ""); > 8785: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); Suggestion: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true); Nit: missing space src/hotspot/cpu/x86/assembler_x86.hpp line 2752: > 2750: void eminmaxss(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); > 2751: void eminmaxsd(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); > 2752: void evminmaxph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len); Is there a reason `evminmaxph` does not have a version where `src` has type `Address`? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1241: > 1239: } > 1240: > 1241: void C2_MacroAssembler::vminmax_fp(int opc, BasicType elem_bt, XMMRegister dst, KRegister mask, Line 1122 mentions the differences between `vminps/vmaxps` and Java semantics. Perhaps a mention of the new instructions introduced in this PR might help people who are confused about the fact that `vminmax_fp` is overloaded. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1246: > 1244: opc == Op_MaxV || opc == Op_MaxReductionV, "sanity"); > 1245: if (elem_bt == T_FLOAT) { > 1246: evminmaxps(dst, mask, src1, src2, true, opc == Op_MinV || opc == Op_MinReductionV ? 0x4 : 0x5, vlen_enc); Perhaps `0x4` and `0x5` should be factored into named constants since they are used in multiple places and it would also help readability if one does not have the documentation handy when reading the code. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/25914#pullrequestreview-2958407187 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2166859511 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2166896645 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2167019420 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2167008970 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2166994321 From aph at openjdk.org Wed Jun 25 15:47:28 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 25 Jun 2025 15:47:28 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: References: Message-ID: On Sat, 14 Jun 2025 16:00:55 GMT, Andrew Haley wrote: >> FWIW, I don't mind the SB assembler support to go under this, separate PR. We sometimes do it to split the work in the series of atomic commits, where the commit like this should certainly be non-regressing. The actual use of SB (spin-pauses) can then come under separate RFE, and would require much more work (and have associated risk). >> >> So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it. > >> So, it would be tad less confusing if we had a dependent RFE for using SB in spin pauses, so it was obvious why do we need it. > > Huh? The least confusing is when the SB support goes in the PR where it is used. That really is obvious, without any dependency chain. > Looks okay to me. @theRealAph should also take a look. I'm still waiting for a use for this thing. Then we'll be able to see it in action. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3005256762 From iveresov at openjdk.org Wed Jun 25 16:15:33 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 25 Jun 2025 16:15:33 GMT Subject: [jdk25] RFR: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: <8JF1CEAXp6lliwGkTOXojtv3Rz3v1nYRL06MUj8OArk=.d8241342-6ca1-4d9b-82ab-8b1451bb76cd@github.com> On Wed, 25 Jun 2025 06:54:35 GMT, Igor Veresov wrote: > 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded Thanks, Alexey! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25969#issuecomment-3005356040 From iveresov at openjdk.org Wed Jun 25 16:15:34 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 25 Jun 2025 16:15:34 GMT Subject: [jdk25] Integrated: 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:54:35 GMT, Igor Veresov wrote: > 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded This pull request has now been integrated. Changeset: fdb3e37c Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/fdb3e37c714a5fd5aa78f9a5528a182c6e961485 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8359788: Internal Error: assert(get_instanceKlass()->is_loaded()) failed: must be at least loaded Reviewed-by: shade Backport-of: 5c4f92ba9a2b820fa12920400c9037b5d3c37aa4 ------------- PR: https://git.openjdk.org/jdk/pull/25969 From eastigeevich at openjdk.org Wed Jun 25 16:41:33 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 25 Jun 2025 16:41:33 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 15:44:59 GMT, Andrew Haley wrote: > > Looks okay to me. @theRealAph should also take a look. > > I'm still waiting for a use for this thing. Then we'll be able to see it in action. Do you mean we need real-life workloads relying on `j.l.Thread.onSpinWait` to show improvements? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3005433977 From shade at openjdk.org Wed Jun 25 16:55:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Jun 2025 16:55:28 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 16:39:07 GMT, Evgeny Astigeevich wrote: > I'm still waiting for a use for this thing. The other project reports Evgeny linked in PR body look pretty convincing, as well as `ThreadOnSpinWait` microbenchmarks we have as well. This PR does not propose to _switch_ to `SB` for spin-waits, AFAICS. Just having `SB` as the spin-wait option does look fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3005475260 From sviswanathan at openjdk.org Wed Jun 25 17:02:33 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 25 Jun 2025 17:02:33 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v14] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Wed, 25 Jun 2025 04:44:11 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24179#pullrequestreview-2958970716 From missa at openjdk.org Wed Jun 25 18:41:41 2025 From: missa at openjdk.org (Mohamed Issa) Date: Wed, 25 Jun 2025 18:41:41 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt Message-ID: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. 2. If these special values are found, return immediately with minimal modifications to the result register. The commands to run all relevant micro-benchmarks are posted below. `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | | [0] | 344990 | 627561 | +81.91 | | [-0] | 291983 | 629941 | +115.75 | | [INF] | 382685 | 542211 | +41.68 | | [-INF] | 386174 | 542291 | +40.43 | | [NaN] | 421700 | 615157 | +45.88 | | Input range(s) | Baseline2 (ops/ms) | Change (ops/ms) | Change vs baseline2 (%) | | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | | [-2^(-1022), 2^(-1022)] | 7072 | 20847 | +194.78 | | (-INF, -2^(-1022)], [2^(-1022), INF) | 147884 | 198925 | +34.51 | | [0] | 1890520 | 627561 | -66.80 | | [-0] | 1890404 | 629941 | -66.68 | | [INF] | 1247633 | 542211 | -56.54 | | [-INF] | 1242287 | 542291 | -56.35 | | [NaN] | 1253700 | 615157 | -50.93 | Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. ------------- Commit messages: - Make absolute mask memory constant 16 byte aligned for compatibility with andpd instruction - Check for special values first in x86_64 cbrt intrinsic Changes: https://git.openjdk.org/jdk/pull/25962/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25962&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358179 Stats: 49 lines in 1 file changed: 10 ins; 36 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25962.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25962/head:pull/25962 PR: https://git.openjdk.org/jdk/pull/25962 From syan at openjdk.org Wed Jun 25 18:41:41 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 25 Jun 2025 18:41:41 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Tue, 24 Jun 2025 22:33:56 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. > 2. If these special values are found, return immediately with minimal modifications to the result register. > > The commands to run all relevant micro-benchmarks are posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` > > The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. > > | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | > | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | > | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | > | [0] | 344990 | 627561 | +81.91 | > | [-0] | 291983 | 629941 | +115.75 | > | [INF] | 382685 | 542211 | +41.68 | > | [-INF... src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 51: > 49: ATTRIBUTE_ALIGNED(16) static const juint _ABS_MASK[] = > 50: { > 51: 4294967295, 2147483647 How about `0xffffffff, 0x7fffffff` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2165834750 From missa at openjdk.org Wed Jun 25 18:41:41 2025 From: missa at openjdk.org (Mohamed Issa) Date: Wed, 25 Jun 2025 18:41:41 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Wed, 25 Jun 2025 05:44:30 GMT, SendaoYan wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] | 291983 | 629941 | +115.75 | >> | [INF] | 382685 | 542211 | +4... > > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 51: > >> 49: ATTRIBUTE_ALIGNED(16) static const juint _ABS_MASK[] = >> 50: { >> 51: 4294967295, 2147483647 > > How about `0xffffffff, 0x7fffffff` instead. I agree it's better to use hex format, but the other constants are in decimal. So, I'd like to have a separate PR that covers all of them rather than having a mismatch between this mask and all the others. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2167201627 From duke at openjdk.org Wed Jun 25 19:52:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 25 Jun 2025 19:52:39 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 10:23:09 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Change to ImmutableDataReferences > > src/hotspot/share/code/nmethod.hpp line 172: > >> 170: friend class DeoptimizationScope; >> 171: >> 172: using ImmutableDataReferences = int; > > Sorry, I might look too annoying. > Let's be more specific. This type represents not references themself. It is a counter of them: `ImmutableDataReferenceCounter` > Let's reflect this in all related names. I updated the name to `ImmutableDataReferencesCounterSize` ([reference](https://github.com/chadrako/jdk/blob/e51a1a09bcee01149fcb4da3e5659e0cad1b2c8b/src/hotspot/share/code/nmethod.hpp#L172)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2167493591 From hgreule at openjdk.org Wed Jun 25 21:06:47 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 25 Jun 2025 21:06:47 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() Message-ID: Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. Please review. Thanks. ------------- Commit messages: - fix - test Changes: https://git.openjdk.org/jdk/pull/25988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25988&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359678 Stats: 83 lines in 3 files changed: 74 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25988/head:pull/25988 PR: https://git.openjdk.org/jdk/pull/25988 From missa at openjdk.org Wed Jun 25 21:14:29 2025 From: missa at openjdk.org (Mohamed Issa) Date: Wed, 25 Jun 2025 21:14:29 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Tue, 24 Jun 2025 22:33:56 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. > 2. If these special values are found, return immediately with minimal modifications to the result register. > > The commands to run all relevant micro-benchmarks are posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` > > The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. > > | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | > | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | > | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | > | [0] | 344990 | 627561 | +81.91 | > | [-0] | 291983 | 629941 | +115.75 | > | [INF] | 382685 | 542211 | +41.68 | > | [-INF... @eme64 There seems to be an environment configuration issue (unrelated to code changes) on the Windows machine(s) this PR is landing on when running pre-submit tests. Could you run the internal tests on this PR to verify everything is ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3006162351 From duke at openjdk.org Wed Jun 25 22:32:24 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 25 Jun 2025 22:32:24 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality > > Additional Testing: > - [ ] Linux x64 fastdebug all > - [ ] Linux aarch64 fastdebug all > - [ ] ... Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Update how call sites are fixed - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Fix pointer printing - Use set_destination_mt_safe - Print address as pointer - Use new _metadata_size instead of _jvmci_data_size - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final - Only check branch distance for aarch64 and riscv - Move far branch fix to fix_relocation_after_move - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e ------------- Changes: https://git.openjdk.org/jdk/pull/23573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=31 Stats: 1654 lines in 34 files changed: 1586 ins; 2 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Wed Jun 25 22:53:39 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 25 Jun 2025 22:53:39 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v31] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 23:22:04 GMT, Dean Long wrote: > The extra work that CallRelocation::fix_relocation_after_move and Relocation::pd_set_call_destination are now doing is only needed for nmethod relocation, right? We could avoid doing extra work for the old CodeBuffer::relocate_code_to() path by adding a flag to CallRelocation::fix_relocation_after_move() (or creating a new function) and then pass that flag on to pd_set_call_destination. In particular, it would be nice to avoid unnecessary calls to CodeCache::find_blob(). I think this is a good idea. Like you mentioned it is only needed for nmethod relocation so it does not make sense to alter the logic for other things that may call these functions. I added a flag so code path for `CodeBuffer::relocate_code_to()` should be unchanged ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3006461511 From syan at openjdk.org Thu Jun 26 01:49:38 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 26 Jun 2025 01:49:38 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 19:34:40 GMT, Hannes Greule wrote: > Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. > > Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. > > I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. > > Please review. Thanks. test/hotspot/jtreg/compiler/c2/gvn/ReverseBytesConstantsHelper.jasm line 25: > 23: */ > 24: > 25: public class ReverseBytesConstantsHelper version 65:0 { Maybe we can remove 'version 65:0' ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25988#discussion_r2167926976 From jkarthikeyan at openjdk.org Thu Jun 26 03:55:30 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 26 Jun 2025 03:55:30 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:44:49 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more nodes to the non-truncating list > > src/hotspot/share/opto/superword.cpp line 2552: > >> 2550: switch (opc) { >> 2551: case Op_ExtractS: >> 2552: case Op_ExtractC: > > Are there any tests for these somewhere? I ended up encountering `ExtractS` when running `Short128VectorTests.java`, but I thought I should add `ExtractC` to the list as well. Unfortunately I don't think we don't have tests for that one yet because `char` doesn't vectorize yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2168044808 From jkarthikeyan at openjdk.org Thu Jun 26 03:55:31 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 26 Jun 2025 03:55:31 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:48:30 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2600: >> >>> 2598: case Op_XorReductionV: >>> 2599: case Op_MaxReductionV: >>> 2600: case Op_MinReductionV: >> >> Why do these vector nodes even end up here? Is that expected? > > Additionallly, it may be good to say why each one operation here is not truncatable. We can also file a follow-up RFE here, to add more investigation. After all, this is a bug-fix and we don't want to work in it too long now. It looks like this was because the test has loops that call the vector API, so when the loop gets unrolled it seems that reduction nodes that were created can show up here: https://github.com/openjdk/jdk/blob/1ca008fd02496dc33e2707c102560cae1690fba5/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L3710-L3721 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2168044574 From epeter at openjdk.org Thu Jun 26 05:52:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 05:52:28 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 03:52:20 GMT, Jasmine Karthikeyan wrote: >> Additionallly, it may be good to say why each one operation here is not truncatable. We can also file a follow-up RFE here, to add more investigation. After all, this is a bug-fix and we don't want to work in it too long now. > > It looks like this was because the test has loops that call the vector API, so when the loop gets unrolled it seems that reduction nodes that were created can show up here: https://github.com/openjdk/jdk/blob/1ca008fd02496dc33e2707c102560cae1690fba5/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L3710-L3721 Should we just handle all vector nodes together, and prevent that any of them are truncated? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2168174361 From epeter at openjdk.org Thu Jun 26 05:56:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 05:56:33 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 06:00:40 GMT, Jasmine Karthikeyan wrote: >> @jaskarth I just checked the results, and there is a series of failing tests. >> >> ------------------------ >> `compiler/c2/Test6958485.java` >> >> Flags: `-server -Xcomp` or `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`. >> >> `# assert(false) failed: Unexpected node in SuperWord truncation: Conv2B` >> >> -------------------------- >> >> `compiler/intrinsics/TestDoubleIsInfinite.java` -> D >> `compiler/intrinsics/TestFloatIsInfinite.java` -> F >> >> Flags: `-XX:UseAVX=3` or `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing` ... not sure if any are necessary actually. >> >> `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteD` >> and >> `# assert(false) failed: Unexpected node in SuperWord truncation: IsInfiniteF` >> >> ----------------------------- >> >> `jdk/incubator/vector/Byte128VectorTests.java` (same issue with all related vector tests, just reporting one here) >> >> Flag: `-XX:UseAVX=2` >> >> `# assert(false) failed: Unexpected node in SuperWord truncation: AddReductionVI` > > @eme64 Thanks for the test results! I've added these nodes to the non-truncating list, as well as the other reduction nodes that showed up when running the vector tests. @jaskarth the tests look better now. I still saw this failure: `jdk/incubator/vector/Byte64VectorTests.java` Flags: `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` or `-XX:UseAVX=0` or `-XX:UseAVX=2` ... probably no flags are actually required. `# assert(false) failed: Unexpected node in SuperWord truncation: ExtractB` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-3007166988 From epeter at openjdk.org Thu Jun 26 06:03:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 06:03:29 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 03:52:32 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/superword.cpp line 2552: >> >>> 2550: switch (opc) { >>> 2551: case Op_ExtractS: >>> 2552: case Op_ExtractC: >> >> Are there any tests for these somewhere? > > I ended up encountering `ExtractS` when running `Short128VectorTests.java`, but I thought I should add `ExtractC` to the list as well. Unfortunately I don't think we don't have tests for that one yet because `char` doesn't vectorize yet. Does it even make sense to vectorize Extract? Is there any case where this actually succeeds? It would be worth filing a follow-up RFE if we don't want to spend the time on this now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2168195317 From epeter at openjdk.org Thu Jun 26 06:12:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 06:12:46 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 12:13:04 GMT, Daniel Lund?n wrote: >> @eme64 Thanks for the comments, I'll start addressing them soon! I'm certainly not discouraged (rather the opposite), keep the comments coming :slightly_smiling_face: > >>@dlunde I responded to a few more issues above at my previous comments. I have not yet looked at the code itself again. I can do that once we have discussed the current topics :) > > Thank @eme64! Sure, sounds good. > >> But then: why not just increase the test timeout? > > I have now reinvestigated (it's been a while) the `java/lang/invoke` tests, and increasing the timeout instead of limiting the number of nodes is indeed a viable (and, I agree, cleaner) option. To investigate, I reset `BigArityTest.java`, `TestCatchExceptionWithVarargs.java`, and `VarargsArrayTest.java` to their current mainline state, and reran testing with many different flag combinations. > - The main issue I encountered earlier on with `BigArityTest.java` was memory consumption, but since I published this PR a `-XX:CompileCommand=memlimit,*.*,0` has been added to the `@run`. Therefore, the current mainline version of this test (with no reduced `MaxNodeLimit`) now passes after this changeset, although it still takes significantly longer to run with `-Xcomp` compared to mainline (due to the additional enabled compilations). `BigArityTest.java` also already has a (very) generous timeout, so it seems expected that this test will take a long time to run. > - `TestCatchExceptionWithVarargs.java` no longer times out. I'm not really sure what changed, but it's now fine to remove the reduced `MaxNodeLimit`. The test still takes longer to run with `-Xcomp` compared to mainline, so we still do the additional compilations. > - `VarargsArrayTest.java` still times out, but setting a more generous timeout works fine. Again, it takes significantly longer to run with `-Xcomp` compared to mainline. > >> To me, this looks like a possible cause for (compile time) regressions. Imagine, someone has such a method in the startup/warmup of their application. And now this change all of the sudden delays compilation by 40seconds. That would be quite bad! >> >> Hence, I wonder if we should not already investigate this now, so we are a bit more sure we do not see 40sec compilations in the wild. > > Yes, I agree that would be unfortunate. The motivation for investigating further in a follow-up issue is that these tests are artificial and do not really occur in practice. We have run extensive performance testing to justify this. While I would prefer to investigate in a follow-up issue, I don't strongly object to investigate as part of this issue. As I mentioned in my response to @robcasloz earlier, I think what we need is to add bai... @dlunde > Yes, I agree that would be unfortunate. The motivation for investigating further in a follow-up issue is that these tests are artificial and do not really occur in practice. We have run extensive performance testing to justify this. While I would prefer to investigate in a follow-up issue, I don't strongly object to investigate as part of this issue. As I mentioned in my response to @robcasloz earlier, I think what we need is to add bailouts related to interference graph size during register allocation. Permitting compilation with many arguments exposes cases where just bailing out on node count (as we do now) is not sufficient. What prevents us from adding such a bailout first? How hard would it be to implement it now, with some threshold that you can lower with a diagnostic flag? That would allow you to test it now independently, and verify if it handles at least the known issues you encountered now? I don't want to block you here either. What do you think @vnkozlov ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3007219530 From dlunden at openjdk.org Thu Jun 26 07:17:42 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 26 Jun 2025 07:17:42 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v16] In-Reply-To: References: Message-ID: <6e8IlDRLl2qYsUiuNb383PwuJ4nh0UQIDoNohxLCp1Y=.7a2b98f7-8d07-4792-a780-562b50b3ac36@github.com> On Thu, 26 Jun 2025 06:09:44 GMT, Emanuel Peter wrote: > What prevents us from adding such a bailout first? How hard would it be to implement it now, with some threshold that you can lower with a diagnostic flag? That would allow you to test it now independently, and verify if it handles at least the known issues you encountered now? I started investigating a bit more yesterday and it seems quite doable. It indeed looks like the number of interference graph edges is unreasonable in the cases with long-running compilations, and that we should add a bailout based on this. I guess you mean that we do this in a separate changeset that we integrate _before_ this PR? That sounds good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3007433973 From epeter at openjdk.org Thu Jun 26 07:39:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 07:39:32 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Wed, 25 Jun 2025 21:12:20 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] | 291983 | 629941 | +115.75 | >> | [INF] | 382685 | 542211 | +4... > > @eme64 There seems to be an environment configuration issue (unrelated to code changes) on the Windows machine(s) this PR is landing on when running pre-submit tests. Could you run the internal tests on this PR to verify everything is ok? @missa-prime I just launched some testing. Yes, it seems the windows failures are unrelated - I've seen them on other PRs too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3007497526 From hgreule at openjdk.org Thu Jun 26 07:55:23 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 26 Jun 2025 07:55:23 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: > Fixes an assertion when passing an int larger than short/char to the corresponding reverseBytes method in a constant-folding scenario. By just using static_cast, we can ignore the upper bytes and just swap the lower bytes. > > Using jasm, I added a test case that covers such inputs. It felt easier to test this way than the other scenarios mentioned in the bug report. > > I also removed the redundant checked_cast calls from the int/long case; we already have the correct type there. > > Please review. Thanks. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: remove classfile version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25988/files - new: https://git.openjdk.org/jdk/pull/25988/files/b0ff150b..6822cca0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25988&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25988&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25988/head:pull/25988 PR: https://git.openjdk.org/jdk/pull/25988 From hgreule at openjdk.org Thu Jun 26 07:55:24 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 26 Jun 2025 07:55:24 GMT Subject: RFR: 8359678: C2: assert(static_cast(result) == thing) caused by ReverseBytesNode::Value() [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 01:47:19 GMT, SendaoYan wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> remove classfile version > > test/hotspot/jtreg/compiler/c2/gvn/ReverseBytesConstantsHelper.jasm line 25: > >> 23: */ >> 24: >> 25: public class ReverseBytesConstantsHelper version 65:0 { > > Maybe we can remove 'version 65:0' ? That works! Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25988#discussion_r2168431457 From aph at openjdk.org Thu Jun 26 07:58:30 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Jun 2025 07:58:30 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: References: Message-ID: <2fVSyaAsO9gfi9RpmqTBa0JXrhkrAGkNPyArpUB-gu8=.499c45d5-5015-4bfa-a997-d75b860504b0@github.com> On Wed, 25 Jun 2025 16:52:32 GMT, Aleksey Shipilev wrote: > > I'm still waiting for a use for this thing. > > The other project reports Evgeny linked in PR body look pretty convincing, as well as `ThreadOnSpinWait` microbenchmarks we have as well. This PR does not propose to _switch_ to `SB` for spin-waits, AFAICS. Just having `SB` as the spin-wait option does look fine to me. I understand, but I see above here, "I am flexible to have it either way." Are you changing your mind? There is no reason not to add the instruction with the patch that uses it, as has been the practice for this port. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3007551920 From xgong at openjdk.org Thu Jun 26 07:59:30 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 26 Jun 2025 07:59:30 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases In-Reply-To: References: Message-ID: <1mmwSiX2OCyFw8bKOj6U1yabINpsZiNblYbvAF8l6dM=.00a75235-c87a-4f04-b863-1f6dc046e4e4@github.com> On Fri, 13 Jun 2025 08:33:09 GMT, erifan wrote: > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 src/hotspot/share/opto/vectorIntrinsics.cpp line 706: > 704: opc = Op_Replicate; > 705: elem_bt = converted_elem_bt; > 706: bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L); Code style. Suggest: if (opc == Op_VectorLongToMask && is_maskall_type(bits_type, num_elem) && arch_supports_vector(Op_Replicate, num_elem, converted_elem_bt, checkFlags, true /*has_scalar_args*/)) { opc = Op_Replicate; elem_bt = converted_elem_bt; bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L); } else if ( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2168425682 From aph at openjdk.org Thu Jun 26 08:05:31 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Jun 2025 08:05:31 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: <2fVSyaAsO9gfi9RpmqTBa0JXrhkrAGkNPyArpUB-gu8=.499c45d5-5015-4bfa-a997-d75b860504b0@github.com> References: <2fVSyaAsO9gfi9RpmqTBa0JXrhkrAGkNPyArpUB-gu8=.499c45d5-5015-4bfa-a997-d75b860504b0@github.com> Message-ID: On Thu, 26 Jun 2025 07:55:50 GMT, Andrew Haley wrote: >>> I'm still waiting for a use for this thing. >> >> The other project reports Evgeny linked in PR body look pretty convincing, as well as `ThreadOnSpinWait` microbenchmarks we have as well. This PR does not propose to _switch_ to `SB` for spin-waits, AFAICS. Just having `SB` as the spin-wait option does look fine to me. > >> > I'm still waiting for a use for this thing. >> >> The other project reports Evgeny linked in PR body look pretty convincing, as well as `ThreadOnSpinWait` microbenchmarks we have as well. This PR does not propose to _switch_ to `SB` for spin-waits, AFAICS. Just having `SB` as the spin-wait option does look fine to me. > > I understand, but I see above here, "I am flexible to have it either way." Are you changing your mind? There is no reason not to add the instruction with the patch that uses it, as has been the practice for this port. > > > Looks okay to me. @theRealAph should also take a look. > > > > > > I'm still waiting for a use for this thing. Then we'll be able to see it in action. > > Do you mean we need real-life workloads relying on `j.l.Thread.onSpinWait` to show improvements? There is no need for this PR to be pushed until the onSpinWait change: there is literally no other use for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3007570142 From epeter at openjdk.org Thu Jun 26 08:07:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 08:07:54 GMT Subject: RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check Message-ID: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. -------------------------- **Where to start reviewing** - `src/hotspot/share/opto/mempointer.hpp`: - Read the class comment for `MemPointerRawSummand`. - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. - `src/hotspot/share/opto/vectorization.cpp`: - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. - `src/hotspot/share/opto/vtransform.hpp`: - Understand the difference between weak and strong edges. If you need to see some examples, then look at the tests: - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). -------------------------- **Details** Most fundamentally: - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` - For aliasing analysis (adjacency and overlap), the "regular" summands are sufficient. But for reconstructing the pointer expression, this could lead to overflow issues. - We need to evaluate the pointer expression at `init` to create the check in `VPointer::make_speculative_aliasing_check_with`. - I wrote up a `MemPointer Linearity Corrolary` that I need for the guarantees in the runtime checks. I also had to enhance the `VLoopDependencyGraph`: - We define `weak` and `strong` memory edges: `strong` are edges that cannot be removed. `weak` are edges that can be removed, and the operations can be reordered, but if reordered we need a runtime check. - `MemPointer::always_overlaps_with`: allows us to check if a memory edge is always strict, because it always aliases (= overlaps). Further: - I added flags `UseAutoVectorizationPredicate` and `UseAutoVectorizationSpeculativeAliasingChecks`. --------------------------------------- **Benchmark** ![image](https://github.com/user-attachments/assets/1a97d9b0-f6c2-46d4-b896-7390864dbfc3) Labels / Columns: - `no_check` = `-XX:-UseAutoVectorizationSpeculativeAliasingChecks` - like before this patch. - `normal` = `-XX:+UseSuperWord` - `no_slow_opt` = `-XX:-LoopMultiversioningOptimizeSlowLoop` - to prove that we need to optimize the slow loop, for the case where the dynamic check fails. - `no_sw` = `-XX:-UseSuperWord` - No vectorization, also has different unrolling. - `not_profitable` = `-XX:AutoVectorizationOverrideProfitability=0` - No vectorization, but keep unrolling the same. Can lead to severe performance regressions especially for byte cases. We have seen similar issues before, e.g. https://github.com/openjdk/jdk/pull/25387 for `byte`, `char` and `short` cases in reduction loops. Discussion: - `?_sameIndex_alias` and `?_sameIndex_noalias`: Since we have `sameIndex`, we already can prove that we can vectorize without checks. We already vectorized these before this patch. - `?_differentIndex_noalias`, `?_half`, `?_partial_overlap`: only vectorizes with dynamic aliasing check. - `?_differentIndex_alias`: cannot use vectorized loop. We now use the `slow_loop`, and if it is not optimized (unrolled), we get a heavy slowdown (`0.35`). **Regular performance testing**: no significant change. Except some possible improvments in `Crypto-SecureRandomBench_nextBytes`. A quick investigation showed that it had at least one loop where the load and the store have different invariants, which requires aliasing analysis runtime checks to prove that the load and store do not alias. ![image](https://github.com/user-attachments/assets/ee9245d4-1e1e-421d-a97a-2b7d5738e7e2) ------------------------------------------ **Follow-up Work** ResourceMark could not be added in `VTransform::apply_speculative_aliasing_runtime_checks`, it would require that `_idom` and `_dom_depth` in `PhaseIdealLoop::set_idom` are not ResouceArea allocated. Related issue: - [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015) Revisit resource arena allocations in C2 ------------- Commit messages: - fix include order - manual merge with master - rm multiversioning testing - more comments cleanu - comment cleanup - more descriptions / proof - improve comments - fix test and code - small comment addition - small fix and more documentation - ... and 179 more: https://git.openjdk.org/jdk/compare/fe7ec312...c260df26 Changes: https://git.openjdk.org/jdk/pull/24278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24278&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324751 Stats: 5306 lines in 24 files changed: 5063 ins; 16 del; 227 mod Patch: https://git.openjdk.org/jdk/pull/24278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24278/head:pull/24278 PR: https://git.openjdk.org/jdk/pull/24278 From epeter at openjdk.org Thu Jun 26 08:09:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 08:09:42 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v23] In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 14:31:24 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Add clarifying comments at definitions of register mask sizes > I guess you mean that we do this in a separate changeset that we integrate before this PR? That sounds good to me. Yes, exactly :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3007584256 From aph at openjdk.org Thu Jun 26 08:16:28 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Jun 2025 08:16:28 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v4] In-Reply-To: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> References: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> Message-ID: On Wed, 25 Jun 2025 13:56:09 GMT, Evgeny Astigeevich wrote: >> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: >> - https://github.com/mysql/mysql-server/pull/611 >> - https://github.com/facebook/folly/pull/2390 >> >> There are discussions regarding using it for spin pauses: >> - https://github.com/gperftools/gperftools/pull/1594 >> - https://github.com/haproxy/haproxy/pull/2974 >> >> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension >> >> CPUs supporting it: >> - Apple M2+ >> - Neoverse-N2 >> - Neoverse-V2 >> >> Tests: >> - Gtests passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. >> >> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): >> >> >> Benchmark Mode Cnt Score Error Units Diff >> ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op >> ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% >> >> Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff >> ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Restore check of non-zero exit value; make spinWaitInstCount always provided in test cmd options Very sorry, my mistake. This is fine. :-) ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25801#pullrequestreview-2961173101 From aph at openjdk.org Thu Jun 26 08:27:37 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Jun 2025 08:27:37 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 16:52:32 GMT, Aleksey Shipilev wrote: > The other project reports Evgeny linked in PR body look pretty convincing Yes. Again, my apologies. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3007634413 From bkilambi at openjdk.org Thu Jun 26 08:27:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 26 Jun 2025 08:27:53 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v6] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/956518ec..234d40c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=04-05 Stats: 327 lines in 6 files changed: 80 ins; 141 del; 106 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Thu Jun 26 08:27:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 26 Jun 2025 08:27:53 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: <8a0CrcYjL8j1IWWhnPjKzgZWrzhOsjOLL1NcOCSEoWE=.2b574e39-7783-45e6-b7c5-0c95aa3ec75c@github.com> References: <8a0CrcYjL8j1IWWhnPjKzgZWrzhOsjOLL1NcOCSEoWE=.2b574e39-7783-45e6-b7c5-0c95aa3ec75c@github.com> Message-ID: On Wed, 25 Jun 2025 10:02:40 GMT, Bhavana Kilambi wrote: >> "as" -> "because". >> Consider the difference between "as I was walking home, the Perseid meteor shower produced a spectacular display" and "because I was walking home, the Perseid meteor shower produced a spectacular display". > > Got it. thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2168496211 From bkilambi at openjdk.org Thu Jun 26 08:27:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 26 Jun 2025 08:27:53 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v5] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 09:53:44 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5288: >> >>> 5286: %} >>> 5287: ins_pipe(pipe_slow); >>> 5288: %} >> >> You should use the macro processor here, not cut and paste. > > Oh right! apologies should have thought of it. I'll update this. Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2168494989 From bkilambi at openjdk.org Thu Jun 26 08:27:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 26 Jun 2025 08:27:54 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v4] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 08:49:26 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2887: >> >>> 2885: // Generate Neon tbl when UseSVE == 0 or UseSVE == 1 with vector length of 16B >>> 2886: >>> 2887: bool useNeon = (UseSVE == 0) || (UseSVE == 1 && isQ); >> >> The function name is `select_from_two_vectors_HS_Neon`, but we still have to check whether to use NEON inside it. It looks confusing. Is it better to split the special `!isQ && UseSVE >=1` cases and combine it to below `select_from_two_vectors` method? >> >> Combining `!isQ` to the sve rule may also make the rule's predicate simpler? > > Thanks, I can do that. Earlier I was trying to keep in line with VectorRearrange which had similar name and if I remember correctly had both Neon and SVE implementation in the one named as "Neon" (before you optimized it recently) and I continued keeping the same format. I agree, this can be simplified further. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2168495661 From aph at openjdk.org Thu Jun 26 08:30:33 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Jun 2025 08:30:33 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v6] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 08:27:53 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5188: > 5186: dnl SELECT_FROM_TWO_VECTORS_SVE(rule_number, first_reg, second_reg) > 5187: define(`SELECT_FROM_TWO_VECTORS_SVE', ` > 5188: instruct vselect_from_two_vectors_SVE_$1(vReg dst, vReg_V$2 src1, vReg_V$3 src2, Suggestion: instruct vselect_from_two_vectors_SVE_$2_$3(vReg dst, vReg_V$2 src1, vReg_V$3 src2, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2168501379 From jbhateja at openjdk.org Thu Jun 26 08:47:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 26 Jun 2025 08:47:12 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v3] In-Reply-To: References: Message-ID: > Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. > > **The following pseudo-code describes the existing algorithm for min/max[FD]:** > > Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. > > btmp = (b < +0.0) ? a : b > atmp = (b < +0.0) ? b : a > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. > > btmp = (b < +0.0) ? b : a > atmp = (b < +0.0) ? a : b > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. > > Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Review comments resolutions - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8360116 - Update comments - Extending the patch to cover reduction operations - 8360116: Add support for AVX10 floating point minmax instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25914/files - new: https://git.openjdk.org/jdk/pull/25914/files/b6e55157..382c9b9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=01-02 Stats: 6650 lines in 365 files changed: 3468 ins; 1485 del; 1697 mod Patch: https://git.openjdk.org/jdk/pull/25914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914 PR: https://git.openjdk.org/jdk/pull/25914 From jbhateja at openjdk.org Thu Jun 26 08:47:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 26 Jun 2025 08:47:13 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v2] In-Reply-To: References: Message-ID: <-BcDsdCnWIW95ESaZ5UIRIFDVOqEy7vDTW4e5xWfTe8=.42c9fe64-2cb3-46d7-99a2-25ae08239f17@github.com> On Wed, 25 Jun 2025 15:31:46 GMT, Manuel H?ssig wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments > > src/hotspot/cpu/x86/assembler_x86.hpp line 2752: > >> 2750: void eminmaxss(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); >> 2751: void eminmaxsd(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8); >> 2752: void evminmaxph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len); > > Is there a reason `evminmaxph` does not have a version where `src` has type `Address`? Currently, we do not have a matcher pattern to consume it, as the MIN/MAX sequence was anyway, a bulky one. I have added a new pattern for memory operand flavor of the pattern specifically for AVX-10, along with this patch. Patch has been regressed over the following tests using Intel SDE https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html (Version 9.53). - test/jdk/jdk/incubator/vector/Double*VectorTests:: (min/max all variants including reduction) - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java - test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java e.g. command line /home/jatinbha/softwares/sde-external-9.53.0-2025-03-16-lin/sde64 -future -ptr_raise -icount -- java > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1241: > >> 1239: } >> 1240: >> 1241: void C2_MacroAssembler::vminmax_fp(int opc, BasicType elem_bt, XMMRegister dst, KRegister mask, > > Line 1122 mentions the differences between `vminps/vmaxps` and Java semantics. Perhaps a mention of the new instructions introduced in this PR might help people who are confused about the fact that `vminmax_fp` is overloaded. Details on insturction semantics can be found in section 11.2 of AVX10 manual https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1246: > >> 1244: opc == Op_MaxV || opc == Op_MaxReductionV, "sanity"); >> 1245: if (elem_bt == T_FLOAT) { >> 1246: evminmaxps(dst, mask, src1, src2, true, opc == Op_MinV || opc == Op_MinReductionV ? 0x4 : 0x5, vlen_enc); > > Perhaps `0x4` and `0x5` should be factored into named constants since they are used in multiple places and it would also help readability if one does not have the documentation handy when reading the code. Hi @mhaessig , Command bits are in accordance with Tables 11.1 and 11.2 of section 11.2. First 2 bits [1:0] signify the operation kind, 00 for min and 01 for max. Next two bits [3:2] signify the sign selection logic and 4th bit 0 for both min/max, with this command word we can emulate the semantics of Math.max/min using a single AVX10 instruciton. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2168533731 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2168533872 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2168533554 From jbhateja at openjdk.org Thu Jun 26 08:47:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 26 Jun 2025 08:47:31 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: On Wed, 25 Jun 2025 06:57:00 GMT, Emanuel Peter wrote: >>> @jatin-bhateja The code looks good to me! I'll run some testing before approving. >>> >>> But someone else could already get started with a second review. >> >> Thanks @eme64 , let us know once you are through with testing. > > @jatin-bhateja The last testing I did passed. I'm now waiting for @sviswa7 to give the approval, then run testing again on our side. I don't have any machine that supports float16, so we are relying on you to run sufficient testing. Hi @eme64, let us know if it's good to land now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-3007692276 From mhaessig at openjdk.org Thu Jun 26 09:09:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 26 Jun 2025 09:09:36 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v2] In-Reply-To: <-BcDsdCnWIW95ESaZ5UIRIFDVOqEy7vDTW4e5xWfTe8=.42c9fe64-2cb3-46d7-99a2-25ae08239f17@github.com> References: <-BcDsdCnWIW95ESaZ5UIRIFDVOqEy7vDTW4e5xWfTe8=.42c9fe64-2cb3-46d7-99a2-25ae08239f17@github.com> Message-ID: <8PZvUMxqZPm_wCGlcNEJbVTzueBZynS0mHLbpROOMDg=.855b2b72-2106-46b5-a4b7-79b0d77c1d6c@github.com> On Thu, 26 Jun 2025 08:43:27 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1246: >> >>> 1244: opc == Op_MaxV || opc == Op_MaxReductionV, "sanity"); >>> 1245: if (elem_bt == T_FLOAT) { >>> 1246: evminmaxps(dst, mask, src1, src2, true, opc == Op_MinV || opc == Op_MinReductionV ? 0x4 : 0x5, vlen_enc); >> >> Perhaps `0x4` and `0x5` should be factored into named constants since they are used in multiple places and it would also help readability if one does not have the documentation handy when reading the code. > > Hi @mhaessig , > Command bits are in accordance with Tables 11.1 and 11.2 of section 11.2. First 2 bits [1:0] signify the operation kind, 00 for min and 01 for max. Next two bits [3:2] signify the sign selection logic and 4th bit 0 for both min/max, with this command word we can emulate the semantics of Math.max/min using a single AVX10 instruciton. I got that from the documentation you kindly linked in the description. My question was rather to define a constant like `AVX10_MINMAX_MAX_COMPARE_SIGN = 0x5` that can be used instead of the plain magic numbers. Because people looking at the code later will not have the "luxury" of being provided a link to the relevant documentation right when puzzling about what `0x4` means in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2168582443 From eastigeevich at openjdk.org Thu Jun 26 10:13:35 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 26 Jun 2025 10:13:35 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v4] In-Reply-To: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> References: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> Message-ID: <_-jCuGxjWZmB2432JGbz3taWJOOuT7RVK5HcywAB9B8=.c99ff87b-2f59-45cd-a85c-ce3cce3d120c@github.com> On Wed, 25 Jun 2025 13:56:09 GMT, Evgeny Astigeevich wrote: >> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: >> - https://github.com/mysql/mysql-server/pull/611 >> - https://github.com/facebook/folly/pull/2390 >> >> There are discussions regarding using it for spin pauses: >> - https://github.com/gperftools/gperftools/pull/1594 >> - https://github.com/haproxy/haproxy/pull/2974 >> >> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension >> >> CPUs supporting it: >> - Apple M2+ >> - Neoverse-N2 >> - Neoverse-V2 >> >> Tests: >> - Gtests passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. >> >> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): >> >> >> Benchmark Mode Cnt Score Error Units Diff >> ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op >> ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% >> >> Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff >> ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Restore check of non-zero exit value; make spinWaitInstCount always provided in test cmd options Thank you, Andrew. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3007931567 From yzheng at openjdk.org Thu Jun 26 10:16:30 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 26 Jun 2025 10:16:30 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: <8ArJnnigTKgHyowtxD2CrmQAC0JKCfL_2k_Nv_5grl8=.c3f53098-2b80-4fa4-ae50-049911d974e5@github.com> On Wed, 25 Jun 2025 17:01:33 GMT, Mohamed Issa wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 51: >> >>> 49: ATTRIBUTE_ALIGNED(16) static const juint _ABS_MASK[] = >>> 50: { >>> 51: 4294967295, 2147483647 >> >> How about `0xffffffff, 0x7fffffff` instead. > > I agree it's better to use hex format, but the other constants are in decimal. So, I'd like to have a separate PR that covers all of them rather than having a mismatch between this mask and all the others. We have done that in Graal port for the old constants https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/lir/amd64/AMD64MathCbrtOp.java#L88-L218 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2168709047 From eastigeevich at openjdk.org Thu Jun 26 10:43:45 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 26 Jun 2025 10:43:45 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 22:32:24 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change only slightly modifies existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality >> >> Additional Testing: >> - [ ] Linux x64 fastdebug all >> - [ ] Linux aarch64 fastdebug all >> - [ ] ... > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Update how call sites are fixed > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Fix pointer printing > - Use set_destination_mt_safe > - Print address as pointer > - Use new _metadata_size instead of _jvmci_data_size > - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final > - Only check branch distance for aarch64 and riscv > - Move far branch fix to fix_relocation_after_move > - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp line 90: > 88: // Patch the constant in the call's trampoline stub. > 89: address trampoline_stub_addr = get_trampoline(); > 90: if (trampoline_stub_addr != nullptr && dest != trampoline_stub_addr) { I think you will not need the checks if you rewrite the code as follows: ```c++ address addr_call = ...; assert(); if (!Assembler::reachable_from_branch_at(addr_call, dest)) { address trampoline_stub_addr = get_trampoline(); assert (trampoline_stub_addr != nullptr, "we need a trampoline"); assert (! is_NativeCallTrampolineStub_at(dest), "chained trampolines"); nativeCallTrampolineStub_at(trampoline_stub_addr)->set_destination(dest); dest = trampoline_stub_addr; } set_destination(dest); ICache::invalidate_range(addr_call, instruction_size); If `dest` is a trampoline in the current nmethod, it is always reachable. So you will not go into setting trampoline's target to itself. Also we will call `get_trampoline`, which involves `CodeCache::find_blob` and ` a traversal of relocations, only if we need a trampoline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2168763051 From epeter at openjdk.org Thu Jun 26 11:43:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 11:43:32 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v14] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Wed, 25 Jun 2025 04:44:11 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions All tests are passing, very nice. Thanks @jatin-bhateja for all the work you put into this ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24179#pullrequestreview-2961783314 From epeter at openjdk.org Thu Jun 26 11:48:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 11:48:28 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Wed, 25 Jun 2025 21:12:20 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] | 291983 | 629941 | +115.75 | >> | [INF] | 382685 | 542211 | +4... > > @eme64 There seems to be an environment configuration issue (unrelated to code changes) on the Windows machine(s) this PR is landing on when running pre-submit tests. Could you run the internal tests on this PR to verify everything is ok? @missa-prime All tests passed in my internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3008198469 From epeter at openjdk.org Thu Jun 26 11:48:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 26 Jun 2025 11:48:29 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Tue, 24 Jun 2025 22:33:56 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. > 2. If these special values are found, return immediately with minimal modifications to the result register. > > The commands to run all relevant micro-benchmarks are posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` > > The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. > > | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | > | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | > | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | > | [0] | 344990 | 627561 | +81.91 | > | [-0] | 291983 | 629941 | +115.75 | > | [INF] | 382685 | 542211 | +41.68 | > | [-INF... I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3008201171 From duke at openjdk.org Thu Jun 26 13:14:45 2025 From: duke at openjdk.org (duke) Date: Thu, 26 Jun 2025 13:14:45 GMT Subject: Withdrawn: 8324751: C2 SuperWord: Aliasing Analysis runtime check In-Reply-To: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> References: <2r_uZpbgYMypJFPIgI_t3NuTg1_S40mbeGrsvqi7IvE=.c4771662-0c55-4c02-96bb-99d5cfdb3697@github.com> Message-ID: On Thu, 27 Mar 2025 13:00:20 GMT, Emanuel Peter wrote: > This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs. > > I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016: > - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate. > - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization. > > -------------------------- > > **Where to start reviewing** > > - `src/hotspot/share/opto/mempointer.hpp`: > - Read the class comment for `MemPointerRawSummand`. > - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks. > > - `src/hotspot/share/opto/vectorization.cpp`: > - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works. > > - `src/hotspot/share/opto/vtransform.hpp`: > - Understand the difference between weak and strong edges. > > If you need to see some examples, then look at the tests: > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning. > - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases. > - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments). > -------------------------- > > **Details** > > Most fundamentally: > - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s. > - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`. > - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)` > - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + ConvI2L(y)` > - For aliasing analysis (adjacency and overlap), the "regu... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24278 From bkilambi at openjdk.org Thu Jun 26 14:20:50 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 26 Jun 2025 14:20:50 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v7] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/234d40c7..f8978870 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=05-06 Stats: 42 lines in 2 files changed: 0 ins; 0 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From bkilambi at openjdk.org Thu Jun 26 14:20:50 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 26 Jun 2025 14:20:50 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v6] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 08:28:02 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5188: > >> 5186: dnl SELECT_FROM_TWO_VECTORS_SVE(rule_number, first_reg, second_reg) >> 5187: define(`SELECT_FROM_TWO_VECTORS_SVE', ` >> 5188: instruct vselect_from_two_vectors_SVE_$1(vReg dst, vReg_V$2 src1, vReg_V$3 src2, > > Suggestion: > > instruct vselect_from_two_vectors_SVE_$2_$3(vReg dst, vReg_V$2 src1, vReg_V$3 src2, Thanks. Updated my patch. Please review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2169188652 From duke at openjdk.org Thu Jun 26 14:23:46 2025 From: duke at openjdk.org (Samuel Chee) Date: Thu, 26 Jun 2025 14:23:46 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet Message-ID: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: ;; cmpxchg { 0x0000e708d144cf60: mov x8, x2 0x0000e708d144cf64: casal x8, x3, [x0] 0x0000e708d144cf68: cmp x8, x2 ;; 0x1F1F1F1F1F1F1F1F 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f ;; } cmpxchg 0x0000e708d144cf70: cset x8, ne // ne = any 0x0000e708d144cf74: dmb ish According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > Atomically sets the value of a variable to the > newValue with the memory semantics of setVolatile if > the variable's current value, referred to as the witness > value, == the expectedValue, as accessed with the memory > semantics of getVolatile. Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) This is also reflected by C2 not having a dmb for the same respective method. [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) ------------- Commit messages: - 8360654: AArch64: Remove redundant dmb from C1 compareAndSet Changes: https://git.openjdk.org/jdk/pull/26000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26000&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360654 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26000/head:pull/26000 PR: https://git.openjdk.org/jdk/pull/26000 From fjiang at openjdk.org Thu Jun 26 14:27:21 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 26 Jun 2025 14:27:21 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v2] In-Reply-To: References: Message-ID: > Hi, please consider. > [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. > The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. > If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. > This may lead to incorrect selection of the arraycopy function when `unaligned` flag for arraycopy is set. > We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. > > This pr adds additional checks for unaligned case on RISC-V to ensure the arraycopy function is selected correctly. > > Test on linux-riscv64: > - [x] Tier1-3 > > JMH data on P550 SBC for reference (w/o and w/ the patch): > > Before: > > Without COH: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op > ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op > ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op > ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op > ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op > ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op > ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op > ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op > ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op > ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op > ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op > ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op > ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op > ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op > ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op > > ------------------------------------------------------------------------- > With COH: > > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 50.667 ? 0.622 ns/op > ArrayClone.byteArraycopy 10 avgt 15 76.917 ? 0.914 ns/op > ArrayClone.byteArraycopy 100 avgt 15 82.928 ? 0.056 n... Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses - riscv: fix c1 primitive array clone intrinsic regression ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25976/files - new: https://git.openjdk.org/jdk/pull/25976/files/445f903b..be980424 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25976&range=00-01 Stats: 2968 lines in 162 files changed: 639 ins; 1861 del; 468 mod Patch: https://git.openjdk.org/jdk/pull/25976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25976/head:pull/25976 PR: https://git.openjdk.org/jdk/pull/25976 From jbhateja at openjdk.org Thu Jun 26 15:45:36 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 26 Jun 2025 15:45:36 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v8] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <60kkIRL2XznEXyYukVXVOoeixm2iGhoOxAbKJi5X0cY=.0268090e-a0d3-45fb-93f4-94caaf9b8497@github.com> <2F9hnA72JKqq9hJchevuSQ8XHveZ51F6tJnb7IcNw30=.1da69bab-23be-473b-92c1-1786916364b9@github.com> Message-ID: <75KXKPVwTxv0x9CBopaNTr6bXrUPGMXVJ1Hw45zkd2Q=.720c2c00-4a12-4c69-bcb5-734cb46d2ee2@github.com> On Wed, 25 Jun 2025 06:57:00 GMT, Emanuel Peter wrote: >>> @jatin-bhateja The code looks good to me! I'll run some testing before approving. >>> >>> But someone else could already get started with a second review. >> >> Thanks @eme64 , let us know once you are through with testing. > > @jatin-bhateja The last testing I did passed. I'm now waiting for @sviswa7 to give the approval, then run testing again on our side. I don't have any machine that supports float16, so we are relying on you to run sufficient testing. Thanks @eme64 and @sviswa7 for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24179#issuecomment-3008932721 From jbhateja at openjdk.org Thu Jun 26 15:45:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 26 Jun 2025 15:45:37 GMT Subject: Integrated: 8352635: Improve inferencing of Float16 operations with constant inputs In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: On Sun, 23 Mar 2025 19:16:22 GMT, Jatin Bhateja wrote: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: a49ecb26 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e Stats: 522 lines in 7 files changed: 380 ins; 67 del; 75 mod 8352635: Improve inferencing of Float16 operations with constant inputs Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24179 From duke at openjdk.org Thu Jun 26 16:24:41 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 26 Jun 2025 16:24:41 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v32] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 10:40:57 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Update how call sites are fixed >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Fix pointer printing >> - Use set_destination_mt_safe >> - Print address as pointer >> - Use new _metadata_size instead of _jvmci_data_size >> - Merge remote-tracking branch 'origin/master' into JDK-8316694-Final >> - Only check branch distance for aarch64 and riscv >> - Move far branch fix to fix_relocation_after_move >> - ... and 80 more: https://git.openjdk.org/jdk/compare/f799cf18...70e4164e > > src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp line 90: > >> 88: // Patch the constant in the call's trampoline stub. >> 89: address trampoline_stub_addr = get_trampoline(); >> 90: if (trampoline_stub_addr != nullptr && dest != trampoline_stub_addr) { > > I think you will not need the checks if you rewrite the code as follows: > ```c++ > address addr_call = ...; > assert(); > > if (!Assembler::reachable_from_branch_at(addr_call, dest)) { > address trampoline_stub_addr = get_trampoline(); > assert (trampoline_stub_addr != nullptr, "we need a trampoline"); > assert (! is_NativeCallTrampolineStub_at(dest), "chained trampolines"); > nativeCallTrampolineStub_at(trampoline_stub_addr)->set_destination(dest); > dest = trampoline_stub_addr; > } > set_destination(dest); > ICache::invalidate_range(addr_call, instruction_size); > > > If `dest` is a trampoline in the current nmethod, it is always reachable. So you will not go into setting trampoline's target to itself. Also we will call `get_trampoline`, which involves `CodeCache::find_blob` and ` a traversal of relocations, only if we need a trampoline. I would need to check the assumptions that other callers make about this function. In the current state it updates the trampoline regardless if the branch is reachable or not. With your change it would require the caller to also update the trampoline to make sure it is not stale. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2169438056 From snatarajan at openjdk.org Thu Jun 26 20:36:53 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 26 Jun 2025 20:36:53 GMT Subject: RFR: 8342941: IGV: Add new graph dumps for post loop, empty loop removal, and one iteration removal [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jun 2025 06:17:55 GMT, Christian Hagedorn wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments > > Thanks for adding those! > > You are currently dumping the `CountedLoop` for the "before" and "after" dump. I think we could improve the "after" dump to show the actual change: > - In the "after" dump of the post loop, we could dump the new post loop instead of the old one. > - For the empty loop removal we could dump the `final_iv` instead: > https://github.com/openjdk/jdk/blob/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e/src/hotspot/share/opto/loopTransform.cpp#L3172 > - For the one iteration removal, I'm not sure if it's worth to print anything. I think we can either print nothing or the init which replaces the iv: > https://github.com/openjdk/jdk/blob/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e/src/hotspot/share/opto/loopTransform.cpp#L3334 @chhagedorn Thank you for the comments. I have addressed them. Could you see if this is okay ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25756#issuecomment-2987108747 From sviswanathan at openjdk.org Thu Jun 26 22:56:40 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 26 Jun 2025 22:56:40 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Tue, 24 Jun 2025 22:33:56 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. > 2. If these special values are found, return immediately with minimal modifications to the result register. > > The commands to run all relevant micro-benchmarks are posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` > > The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. > > | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | > | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | > | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | > | [0] | 344990 | 627561 | +81.91 | > | [-0] | 291983 | 629941 | +115.75 | > | [INF] | 382685 | 542211 | +41.68 | > | [-INF... src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 51: > 49: ATTRIBUTE_ALIGNED(16) static const juint _ABS_MASK[] = > 50: { > 51: 4294967295, 2147483647 This should be a 128 bit constant as we are using it with andpd. Also please add in comments the hex value. src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 217: > 215: __ bind(B1_1); > 216: __ ucomisd(xmm0, ExternalAddress(ZERON), r11 /*rscratch*/); > 217: __ jcc(Assembler::zero, L_2TAG_PACKET_1_0_1); // Branch only if x is +/- zero or NaN This could be Assembler::equal to be consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2170180412 PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2170181404 From fyang at openjdk.org Fri Jun 27 00:48:40 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Jun 2025 00:48:40 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v2] In-Reply-To: References: Message-ID: <0KGnclomCZzNwJ4G7FsTygFg8xI6bl05uk7MlOGB5ps=.4eb8c6b3-5e25-4f84-9e11-2d276eb810a9@github.com> On Thu, 26 Jun 2025 14:27:21 GMT, Feilong Jiang wrote: >> Hi, please consider. >> [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. >> The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. >> If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. >> This may lead to incorrect selection of the arraycopy function when `unaligned` flag for arraycopy is set. >> We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. >> >> This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. And keep the flag of other platforms always be `0` when `OmitChecksFlag` is true. >> >> Test on linux-riscv64: >> - [ ] Tier1-3 >> >> JMH data on P550 SBC for reference (w/o and w/ the patch): >> >> Before: >> >> Without COH: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op >> ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op >> ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op >> ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op >> ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op >> ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op >> ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op >> ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op >> ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op >> ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op >> ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op >> ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op >> >> ------------------------------------------------------------------------- >> With COH: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 50.667 ? 0.622 ... > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone > - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses > - riscv: fix c1 primitive array clone intrinsic regression Seems fine to me. You need another reviewer. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25976#pullrequestreview-2964220817 From fjiang at openjdk.org Fri Jun 27 01:03:48 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 27 Jun 2025 01:03:48 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v2] In-Reply-To: References: Message-ID: <-dPDS9Gq54rx3fOr4NCUfR5VAZVpT_zH9RTO6onhDxM=.a7a718f9-aa75-4594-9dee-ae0934bb40a3@github.com> On Thu, 26 Jun 2025 14:27:21 GMT, Feilong Jiang wrote: >> Hi, please consider. >> [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. >> The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. >> If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. >> This may lead to incorrect selection of the arraycopy function when `-XX:+UseCompactObjectHeaders` is enabled, causing the `unaligned` flag to be set for arraycopy. >> We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. >> >> This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. And keep the flag of other platforms always be `0` when `OmitChecksFlag` is true. >> >> Test on linux-riscv64: >> - [ ] Tier1-3 >> >> JMH data on P550 SBC for reference (w/o and w/ the patch): >> >> Before: >> >> Without COH: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op >> ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op >> ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op >> ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op >> ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op >> ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op >> ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op >> ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op >> ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op >> ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op >> ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op >> ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op >> >> ------------------------------------------------------------------------- >> With COH: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClon... > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone > - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses > - riscv: fix c1 primitive array clone intrinsic regression Since this changed C1 shared code, can I have another reviewer, please? Maybe the original author of this work @galderz @rwestrel could take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25976#issuecomment-3010950199 From missa at openjdk.org Fri Jun 27 01:43:16 2025 From: missa at openjdk.org (Mohamed Issa) Date: Fri, 27 Jun 2025 01:43:16 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: <0SSFi9-QrQXIJMkJgqAsEq3vi6MBzzjjKiStSLwrMuw=.b1f730f0-337d-474e-9d0a-e89ab4cf32f1@github.com> On Thu, 26 Jun 2025 22:52:39 GMT, Sandhya Viswanathan wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks > > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 51: > >> 49: ATTRIBUTE_ALIGNED(16) static const juint _ABS_MASK[] = >> 50: { >> 51: 4294967295, 2147483647 > > This should be a 128 bit constant as we are using it with andpd. Also please add in comments the hex value. I changed it into a 128 bit constant and added comments showing the hex value. > src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 217: > >> 215: __ bind(B1_1); >> 216: __ ucomisd(xmm0, ExternalAddress(ZERON), r11 /*rscratch*/); >> 217: __ jcc(Assembler::zero, L_2TAG_PACKET_1_0_1); // Branch only if x is +/- zero or NaN > > This could be Assembler::equal to be consistent. I switched it to Assembler::equal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2170483184 PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2170483818 From missa at openjdk.org Fri Jun 27 01:43:16 2025 From: missa at openjdk.org (Mohamed Issa) Date: Fri, 27 Jun 2025 01:43:16 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. > 2. If these special values are found, return immediately with minimal modifications to the result register. > > The commands to run all relevant micro-benchmarks are posted below. > > `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` > `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` > > The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. > > | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | > | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | > | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | > | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | > | [0] | 344990 | 627561 | +81.91 | > | [-0] | 291983 | 629941 | +115.75 | > | [INF] | 382685 | 542211 | +41.68 | > | [-INF... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25962/files - new: https://git.openjdk.org/jdk/pull/25962/files/59201ed9..615169d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25962&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25962&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25962.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25962/head:pull/25962 PR: https://git.openjdk.org/jdk/pull/25962 From xgong at openjdk.org Fri Jun 27 01:43:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 27 Jun 2025 01:43:47 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases In-Reply-To: References: Message-ID: On Fri, 13 Jun 2025 08:33:09 GMT, erifan wrote: > If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. > > And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. > > Some JTReg test cases are added to ensure the optimization is effective. > > I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. > > The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. > > [1] https://github.com/openjdk/jdk/pull/24674 src/hotspot/share/opto/vectorIntrinsics.cpp line 80: > 78: return false; > 79: } > 80: long mask = (0xFFFFFFFFFFFFFFFFULL >> (64 - vlen)); The higher bits of long input should be cleared. So we should generate an unsigned right shift instead of the signed one? src/hotspot/share/opto/vectorIntrinsics.cpp line 706: > 704: opc = Op_Replicate; > 705: elem_bt = converted_elem_bt; > 706: bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L); The `maskAll(false)` will be generated only if the input bits is `0L`. So if `bits = 0xffff0000`, and the `vlen = 4`, what is the expected mask? Is it expected to all false or all true? From the API design, I guess it is all false. But from the code above, it will be all true, right? Could you please double check and add a test for that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2170481682 PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2170486973 From xgong at openjdk.org Fri Jun 27 03:01:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 27 Jun 2025 03:01:53 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v9] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 10:08:23 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Address more comments > > ATT. > - Merge branch 'master' into JDK-8354242 > - Support negating unsigned comparison for BoolTest::mask > > Added a static method `negate_mask(mask btm)` into BoolTest class to > negate both signed and unsigned comparison. > - Addressed some review comments > - Merge branch 'master' into JDK-8354242 > - Refactor the JTReg tests for compare.xor(maskAll) > > Also made a bit change to support pattern `VectorMask.fromLong()`. > - Merge branch 'master' into JDK-8354242 > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - ... and 5 more: https://git.openjdk.org/jdk/compare/57743205...5ebdc572 LGTM! Thanks for your updating! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2964538307 From xgong at openjdk.org Fri Jun 27 03:17:46 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 27 Jun 2025 03:17:46 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v7] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 14:20:50 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments LGTM except some minor code style issues. Thanks so much for your updating! src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5168: > 5166: define(`SELECT_FROM_TWO_VECTORS_NEON', ` > 5167: instruct vselect_from_two_vectors_Neon_$1_$2(vReg dst, vReg_V$1 src1, vReg_V$2 src2, > 5168: vReg index, vReg tmp1) %{ Suggestion: vReg index, vReg tmp) %{ src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5177: > 5175: uint length_in_bytes = Matcher::vector_length_in_bytes(this); > 5176: __ select_from_two_vectors_Neon($dst$$FloatRegister, $src1$$FloatRegister, > 5177: $src2$$FloatRegister,$index$$FloatRegister, Suggestion: $src2$$FloatRegister, $index$$FloatRegister, src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5189: > 5187: define(`SELECT_FROM_TWO_VECTORS_SVE', ` > 5188: instruct vselect_from_two_vectors_SVE_$1_$2(vReg dst, vReg_V$1 src1, vReg_V$2 src2, > 5189: vReg index, vReg tmp1) %{ Suggestion: vReg index, vReg tmp) %{ src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 5198: > 5196: uint length_in_bytes = Matcher::vector_length_in_bytes(this); > 5197: __ select_from_two_vectors_SVE($dst$$FloatRegister, $src1$$FloatRegister, > 5198: $src2$$FloatRegister,$index$$FloatRegister, Suggestion: $src2$$FloatRegister, $index$$FloatRegister, src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2859: > 2857: void C2_MacroAssembler::select_from_two_vectors_Neon(FloatRegister dst, FloatRegister src1, > 2858: FloatRegister src2, FloatRegister index, > 2859: FloatRegister tmp1, BasicType bt, bool isQ) { Suggestion: FloatRegister tmp, BasicType bt, bool isQ) { src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2927: > 2925: void C2_MacroAssembler::select_from_two_vectors_SVE(FloatRegister dst, FloatRegister src1, > 2926: FloatRegister src2, FloatRegister index, > 2927: FloatRegister tmp1, BasicType bt, Suggestion: FloatRegister tmp, BasicType bt, src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2938: > 2936: } else { // UseSVE == 2 and vector_length_in_bytes > 8 > 2937: assert(UseSVE == 2, "must be sve2"); > 2938: sve_tbl(dst, size, src1, 2, index); Suggestion: assert(UseSVE == 2, "must be sve2"); sve_tbl(dst, size, src1, 2, index); ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/23570#pullrequestreview-2964548012 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170627995 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170626178 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170630105 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170629730 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170631738 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170638885 PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2170638159 From swen at openjdk.org Fri Jun 27 03:18:39 2025 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 27 Jun 2025 03:18:39 GMT Subject: RFR: 8356044: Use Double::hashCode and Long::hashCode in java.vm.ci.meta In-Reply-To: <8SlBOjUBPGyZbR9GxEBZlLzOiNPbdws1GTZ4gGY8v9c=.fdefa26b-52ee-48f9-b814-3981b79f6012@github.com> References: <8SlBOjUBPGyZbR9GxEBZlLzOiNPbdws1GTZ4gGY8v9c=.fdefa26b-52ee-48f9-b814-3981b79f6012@github.com> Message-ID: On Thu, 1 May 2025 16:05:15 GMT, Shaojin Wen wrote: > Similar to #24959 and #24971 and #24987, AbstractProfiledItem/PrimitiveConstant in java.vm.ci.meta can also be simplified similarly. > > Replace manual bitwise operations in hashCode implementations of java.vm.ci.meta.AbstractProfiledItem/java.vm.ci.meta.PrimitiveConstant with Long::hashCode/Double.hashCode. keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/24988#issuecomment-3011263985 From xgong at openjdk.org Fri Jun 27 03:48:44 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 27 Jun 2025 03:48:44 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 20 May 2025 19:39:30 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > This pr is splited from https://github.com/openjdk/jdk/pull/25341, and contains only share code change. > > This patch enable the vectorization of statement like `fd_1 bop fd_2 ? res_1 : res_2` in a loop. > > The current behaviour on other platforms support vecatorization of `fd_1 bop fd_2 ? res_1 : res_2` in a loop only when `fd` and `res` have the same size, but this constraint seems not necessary at least not necessary on riscv, so I relax this constraint on riscv, maybe on other platforms it can be relaxed too, but currently I only made it work on riscv. > Besides of this, I also relax the constraint on transforming Op_CMoveI/L to Op_VectorBlend on riscv, this bring some extra benefit when the `res` is not float or double types. > Both relaxation bring performance benefit via vectorization. > > Compared with other runs (master, master with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on, patch without flags turned on), average improvement introduced by the patch with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on is more than 2.1 times, in some cases it can bring more than 4 times improvement. > When `-XX:-UseVectorCmov -XX:-UseCMoveUnconditionally` turned off, there is no regression on average. > > Check more details at: https://github.com/openjdk/jdk/pull/25341. > > Thanks src/hotspot/share/opto/vectornode.cpp line 438: > 436: if (vopc == Op_VectorBlend) { > 437: return VectorBlendNode::implemented(opc); > 438: } To keep the same code style with other ops, maybe we can directly use the body of `VectorBlendNode::implemented()` here. Or making `VectorBlendNode::implemented()` as a static method of `VectorNode` like other ops. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2170692996 From xgong at openjdk.org Fri Jun 27 06:07:39 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 27 Jun 2025 06:07:39 GMT Subject: RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases In-Reply-To: References: Message-ID: <34p1DHverqucTroSmERaeSx94Knl2FMfVWxedlij0JA=.a4ab7090-8a1c-421a-bc4b-7e1c17f03246@github.com> On Fri, 27 Jun 2025 01:36:44 GMT, Xiaohong Gong wrote: >> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. >> >> And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. >> >> Some JTReg test cases are added to ensure the optimization is effective. >> >> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. >> >> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. >> >> [1] https://github.com/openjdk/jdk/pull/24674 > > src/hotspot/share/opto/vectorIntrinsics.cpp line 80: > >> 78: return false; >> 79: } >> 80: long mask = (0xFFFFFFFFFFFFFFFFULL >> (64 - vlen)); > > The higher bits of long input should be cleared. So we should generate an unsigned right shift instead of the signed one? I noticed that you used `ULL` suffix. So it should be fine. Please ignore above comment. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2170871085 From shade at openjdk.org Fri Jun 27 07:33:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Jun 2025 07:33:40 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 26 Jun 2025 12:13:19 GMT, Samuel Chee wrote: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) I heavily suspect some of these extra barriers has to do with compatibility across different barrier schemes in interpreter, C1 and C2, especially when `cmpxchg` is expanded to LL/SC (happens, AFAICS, without LSE). That, and that is historically we believed CAS should has release memory semantics even on failure. There are other instances of this, see e.g. `MacroAssembler::cmpxchg*`: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L3380 You need to verify that jcstress does not fail with this patch, also with `-XX:-UseLSE`. Take the latest bundle from here: https://builds.shipilev.net/jcstress/ ------------- PR Review: https://git.openjdk.org/jdk/pull/26000#pullrequestreview-2965119245 From shade at openjdk.org Fri Jun 27 08:26:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Jun 2025 08:26:10 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers Message-ID: When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. A taste of improvements, about 15% less CPU spent: $ time make test TEST=applications/ctw/modules # Current real 5m1.616s user 79m41.398s sys 14m39.607s # Patched real 3m55.411s user 69m19.227s sys 5m24.323s The compilation still works as expected, progressing through tiers 1..4: $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out ... $ grep sun.tools.serialver.resources.serialver_de::getContents out 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization ------------- Commit messages: - Deopt Changes: https://git.openjdk.org/jdk/pull/26013/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26013&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360783 Stats: 8 lines in 1 file changed: 6 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26013/head:pull/26013 PR: https://git.openjdk.org/jdk/pull/26013 From shade at openjdk.org Fri Jun 27 08:38:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Jun 2025 08:38:31 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: > When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. > > A taste of improvements, about 15% less CPU spent: > > > $ time make test TEST=applications/ctw/modules > > # Current > real 5m1.616s > user 79m41.398s > sys 14m39.607s > > # Patched > real 3m55.411s > user 69m19.227s > sys 5m24.323s > > > The compilation still works as expected, progressing through tiers 1..4: > > > $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out > ... > $ grep sun.tools.serialver.resources.serialver_de::getContents out > 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used > 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used > 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) > 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used > 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26013/files - new: https://git.openjdk.org/jdk/pull/26013/files/7dce90f7..34913a48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26013&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26013&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26013/head:pull/26013 PR: https://git.openjdk.org/jdk/pull/26013 From thartmann at openjdk.org Fri Jun 27 08:38:32 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 27 Jun 2025 08:38:32 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:34:57 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Looks good to me. test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java line 174: > 172: public final void run() { > 173: // Make sure method is not compiled at any level before starting > 174: // progressive compilations. No deopt in-between tiers deopt is needed, Suggestion: // progressive compilations. No deopt in-between tiers is needed, ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26013#pullrequestreview-2965403686 PR Review Comment: https://git.openjdk.org/jdk/pull/26013#discussion_r2171220451 From jbhateja at openjdk.org Fri Jun 27 08:54:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 27 Jun 2025 08:54:23 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v4] In-Reply-To: References: Message-ID: > Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. > > **The following pseudo-code describes the existing algorithm for min/max[FD]:** > > Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. > > btmp = (b < +0.0) ? a : b > atmp = (b < +0.0) ? b : a > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. > > btmp = (b < +0.0) ? b : a > atmp = (b < +0.0) ? a : b > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. > > Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25914/files - new: https://git.openjdk.org/jdk/pull/25914/files/382c9b9e..89697983 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=02-03 Stats: 31 lines in 5 files changed: 14 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/25914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914 PR: https://git.openjdk.org/jdk/pull/25914 From mhaessig at openjdk.org Fri Jun 27 09:17:42 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 27 Jun 2025 09:17:42 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:38:31 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Those are quite some time savings! Thank you for fixing this. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26013#pullrequestreview-2965590557 From aph at openjdk.org Fri Jun 27 09:49:43 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Jun 2025 09:49:43 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 26 Jun 2025 12:13:19 GMT, Samuel Chee wrote: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) I'm not sure this is safe. In C1, if we had a stlxr that succeeded, followed by a volatile load, then we'd generate: stlxr status, data, [addr] cbnz status, retry ldr r1, [something] dmb ish I think there's nothing to prevent the `ldr` from being reordered with the `stlxr`, violating sequential consistency. You could argue that subsequent memory operations are control dependent on the `cbnz` so can't be reordered, but if a microarchitecture predicts that the `stlxr` can never fail, the control dependency can be folded away. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3012390049 From duke at openjdk.org Fri Jun 27 09:59:40 2025 From: duke at openjdk.org (Samuel Chee) Date: Fri, 27 Jun 2025 09:59:40 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 26 Jun 2025 12:13:19 GMT, Samuel Chee wrote: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) Thanks for the feedback, could it be sufficient then to emit the dmb when not using LSE? And when using LSE, to not do so? I have tested it with jcstress, although I have not done so using the -XX:-UseLSE flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3012422039 From thartmann at openjdk.org Fri Jun 27 10:19:40 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 27 Jun 2025 10:19:40 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> Message-ID: <7UkSbnceEz4PY3UDwyR9iOseuvS4sD8FBBGl96mG_lk=.e94b4126-9df5-406b-a3f3-b21439d848e6@github.com> On Mon, 23 Jun 2025 08:56:12 GMT, Taizo Kurashige wrote: >> This PR is improvement of warning message when fail to load hsdis library. >> >> [JDK-8287001](https://bugs.openjdk.org/browse/JDK-8287001) introduced a warning on hsdis library load failure. This is useful when the user executes -XX:+PrintAssembly, etc. >> >> However, I think that when hs_err occurs, users might be confused by this warning printed by Xlog. Because users are not likely to know that hsdis is loaded for the [MachCode] section of the hs_err report, they may wonder, for example, "Why do I get warnings about hsdis load errors when -XX:+PrintAssembly is not specified?." >> >> To clear up this confusion, I suggest printing a warning just before [MachCode]. >> >>
>> >> sample output >> >> If hs_err occurs and hsdis load fails without the option to specify where the hs_err report should be output, the following is output to the hs_err_pir log file: >> >> . >> . >> native method entry point (kind = native) [0x000001ae8753cec0, 0x000001ae8753dac0] 3072 bytes >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> 0x000001ae8753cec0: 488b 4b08 | 0fb7 492e | 584c 8d74 | ccf8 6800 | 0000 0068 | 0000 0000 | 5055 488b | ec41 5548 >> 0x000001ae8753cee0: 8b43 084c | 8d68 3848 | 8b40 0868 | 0000 0000 | 5348 8b50 | 18 >> . >> . >> >> >> If -XX:+PrintAssembly is specified and hsdis load fails, the following is output to the stdout >> >> $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -version >> OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output >> >> ============================= C1-compiled nmethod ============================== >> ----------------------------------- Assembly ----------------------------------- >> >> Compiled method (c1) 57 2 3 java.lang.Object:: (1 bytes) >> total in heap [0x0000024a08a00008,0x0000024a08a00208] = 512 >> . >> . >> >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Instructions begin] >> 0x0000024a08a00100: 6666 660f | 1f84 0000 | 0000 0066 | 6666 9066 | 6690 448b | 5208 443b >> . >> . >> [Constant Pool (empty)] >> >> >> Loading hsdis library failed, so undisassembled code is printed in the below [MachCode] section >> [MachCode] >> [Verified Entry Point] >> # {method} {0x00000000251a1898} 'toUnsignedInt' '(B)I' in 'java/lang/Byte >> . >> . >> >> >>
>> >> Since... > > Taizo Kurashige has updated the pull request incrementally with one additional commit since the last revision: > > Fix message and revert lines for Xlog Don't we already have such error reporting code in `Disassembler::load_library`? https://github.com/openjdk/jdk/blob/c33c1cfe7349ac657cd7bf54861227709d3c8f1b/src/hotspot/share/compiler/disassembler.cpp#L863 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-3012486620 From aph at openjdk.org Fri Jun 27 10:19:39 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Jun 2025 10:19:39 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: <6fTzfD2NqLEi-QSQVfZqZgbDl9aa36D8w_DkqGAzprY=.26033209-8c49-4ef8-a2c7-b44b443ac08b@github.com> On Fri, 27 Jun 2025 09:56:33 GMT, Samuel Chee wrote: > Thanks for the feedback, could it be sufficient then to emit the dmb when not using LSE? And when using LSE, to not do so? I don't think that LSE helps here. There's nothing to stop an unordered load moving before a `casal`. See _atomic-ordered-before_ in _The AArch64 Application Level Memory Model_. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3012484247 From aph at openjdk.org Fri Jun 27 10:28:39 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Jun 2025 10:28:39 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 26 Jun 2025 12:13:19 GMT, Samuel Chee wrote: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) The best way to check this stuff is with the [herd7](https://diy.inria.fr/www/) memory model simulator, which can find reorderings you wouldn't believe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3012506227 From duke at openjdk.org Fri Jun 27 12:05:39 2025 From: duke at openjdk.org (Samuel Chee) Date: Fri, 27 Jun 2025 12:05:39 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Thu, 26 Jun 2025 12:13:19 GMT, Samuel Chee wrote: > AtomicLong.CompareAndSet has the following assembly dump snippet which gets emitted from the intermediary LIRGenerator::atomic_cmpxchg: > > ;; cmpxchg { > 0x0000e708d144cf60: mov x8, x2 > 0x0000e708d144cf64: casal x8, x3, [x0] > 0x0000e708d144cf68: cmp x8, x2 > ;; 0x1F1F1F1F1F1F1F1F > 0x0000e708d144cf6c: mov x8, #0x1f1f1f1f1f1f1f1f > ;; } cmpxchg > 0x0000e708d144cf70: cset x8, ne // ne = any > 0x0000e708d144cf74: dmb ish > > > According to the Oracle Java Specification, AtomicLong.CompareAndSet [1] has the same memory effects as specified by VarHandle.compareAndSet which has the following effects: [2] > >> Atomically sets the value of a variable to the >> newValue with the memory semantics of setVolatile if >> the variable's current value, referred to as the witness >> value, == the expectedValue, as accessed with the memory >> semantics of getVolatile. > > > > Hence the release on the store due to setVolatile only occurs if the compare is successful. Since casal already satisfies these requirements, the dmb does not need to occur to ensure memory ordering in case the compare fails and a release does not happen. > > Hence we remove the dmb from both casl and casw (same logic applies to the non-long variant) > > This is also reflected by C2 not having a dmb for the same respective method. > > [1] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/util/concurrent/atomic/AtomicLong.html#compareAndSet(long,long) > [2] https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/invoke/VarHandle.html#compareAndSet(java.lang.Object...) I can double check with the herd7 simulator, but since the `casal` will always produce an acquire, to me it seems impossible that a load can be moved before the `casal` due the acquire within the `casal`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3012781394 From galder at openjdk.org Fri Jun 27 12:45:41 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Jun 2025 12:45:41 GMT Subject: RFR: 8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 14:27:21 GMT, Feilong Jiang wrote: >> Hi, please consider. >> [JDK-8333154](https://bugs.openjdk.org/browse/JDK-8333154) Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V. >> The new instruction flag `OmitChecksFlag` (introduced by [JDK-8302850](https://bugs.openjdk.org/browse/JDK-8302850)) is used to avoid instantiation of array copy stubs for primitive array clones. >> If `OmitChecksFlag` is set, all flags (including the `unaligned` flag) will be cleared before generating the `LIR_OpArrayCopy` node. >> This may lead to incorrect selection of the arraycopy function when `-XX:+UseCompactObjectHeaders` is enabled, causing the `unaligned` flag to be set for arraycopy. >> We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled. >> >> This pr keeps the `unaligned` flag on RISC-V to ensure the arraycopy function is selected correctly. >> The other platforms are not affected as the flag is always `0` when `OmitChecksFlag` is true. >> >> Test on linux-riscv64: >> - [x] Tier1-3 >> >> JMH data on P550 SBC for reference (w/o and w/ the patch): >> >> Before: >> >> Without COH: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 50.854 ? 0.379 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 74.294 ? 0.449 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 81.847 ? 0.082 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 480.106 ? 0.369 ns/op >> ArrayClone.byteClone 0 avgt 15 90.146 ? 0.299 ns/op >> ArrayClone.byteClone 10 avgt 15 130.525 ? 0.384 ns/op >> ArrayClone.byteClone 100 avgt 15 251.942 ? 0.122 ns/op >> ArrayClone.byteClone 1000 avgt 15 407.580 ? 0.318 ns/op >> ArrayClone.intArraycopy 0 avgt 15 49.984 ? 0.436 ns/op >> ArrayClone.intArraycopy 10 avgt 15 76.302 ? 1.388 ns/op >> ArrayClone.intArraycopy 100 avgt 15 267.487 ? 0.329 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 1157.444 ? 1.588 ns/op >> ArrayClone.intClone 0 avgt 15 90.130 ? 0.257 ns/op >> ArrayClone.intClone 10 avgt 15 183.619 ? 0.588 ns/op >> ArrayClone.intClone 100 avgt 15 296.491 ? 0.246 ns/op >> ArrayClone.intClone 1000 avgt 15 828.695 ? 1.501 ns/op >> >> ------------------------------------------------------------------------- >> With COH: >> >> Benchmark (size) Mode Cnt Score Error Un... > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fix-c1-primitive-clone > - check unaligned flag at LIR_OpArrayCopy to avoid using AvoidUnalignedAccesses > - riscv: fix c1 primitive array clone intrinsic regression I can't really review it since I'm not familiar with neither riscv, nor the flag nor the COH logic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25976#issuecomment-3012953001 From eastigeevich at openjdk.org Fri Jun 27 12:51:45 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 27 Jun 2025 12:51:45 GMT Subject: Integrated: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait In-Reply-To: References: Message-ID: <2RO-3DnjMv2gBDmTQ3levV1wrDYR4MhuUKzafgicGnE=.b8f704bc-4eb8-41eb-a94c-d68a98a26acd@github.com> On Fri, 13 Jun 2025 14:00:08 GMT, Evgeny Astigeevich wrote: > There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: > - https://github.com/mysql/mysql-server/pull/611 > - https://github.com/facebook/folly/pull/2390 > > There are discussions regarding using it for spin pauses: > - https://github.com/gperftools/gperftools/pull/1594 > - https://github.com/haproxy/haproxy/pull/2974 > > Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension > > CPUs supporting it: > - Apple M2+ > - Neoverse-N2 > - Neoverse-V2 > > Tests: > - Gtests passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. > - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. > > Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): > > > Benchmark Mode Cnt Score Error Units Diff > ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op > ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff > ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% > ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% > ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% > ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% > ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% > ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op > ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% This pull request has now been integrated. Changeset: ecd2d830 Author: Evgeny Astigeevich URL: https://git.openjdk.org/jdk/commit/ecd2d83096a1fea7d5086736306770bcffa4fdb6 Stats: 332 lines in 12 files changed: 34 ins; 1 del; 297 mod 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait Reviewed-by: shade, aph ------------- PR: https://git.openjdk.org/jdk/pull/25801 From aph at openjdk.org Fri Jun 27 12:56:44 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Jun 2025 12:56:44 GMT Subject: RFR: 8360654: AArch64: Remove redundant dmb from C1 compareAndSet In-Reply-To: References: <3ZAHrkER2pU6L346Y40fDkndJhkFGjBrQQ4xX7cx80w=.527c87b5-f2ec-43eb-be68-d0b802c76940@github.com> Message-ID: On Fri, 27 Jun 2025 12:03:13 GMT, Samuel Chee wrote: > Clause 9 of before-barrier-ordering in the Arm Architecture reference manual also supports this. Which clause is that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26000#issuecomment-3012987115 From bkilambi at openjdk.org Fri Jun 27 15:02:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 27 Jun 2025 15:02:55 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v8] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: code style issues fixed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23570/files - new: https://git.openjdk.org/jdk/pull/23570/files/f8978870..22e42de3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23570&range=06-07 Stats: 75 lines in 4 files changed: 0 ins; 0 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/23570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23570/head:pull/23570 PR: https://git.openjdk.org/jdk/pull/23570 From aph at openjdk.org Fri Jun 27 15:33:41 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Jun 2025 15:33:41 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector [v8] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 15:02:55 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. >> >> It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. >> >> For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. >> >> For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. >> >> This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. >> >> Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - >> >> >> Benchmark (size) Mode Cnt Gain >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 >> SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 >> SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 >> SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 >> SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 >> SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 >> SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 >> SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 >> SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 >> SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 >> >> >> Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > code style issues fixed src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 4231: > 4229: > 4230: // SVE/SVE2 Programmable table lookup in one or two vector table (zeroing) > 4231: void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, unsigned reg_count, FloatRegister Zm) { This would be better: private: void _sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn, unsigned reg_count, FloatRegister Zm) { ... then 2 patterns ... public: void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn1, FloatRegister Zn2, unsigned reg_count, FloatRegister Zm) void sve_tbl(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn1, FloatRegister Zm) { ... and make sure that `Zn1+ 1 == Zn2` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r2172275666 From sviswanathan at openjdk.org Fri Jun 27 16:00:45 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 27 Jun 2025 16:00:45 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] | 291983 | 629941 | +115.75 | >> | [INF] | 382685 | 542211 | +4... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25962#pullrequestreview-2967055776 From missa at openjdk.org Fri Jun 27 16:00:46 2025 From: missa at openjdk.org (Mohamed Issa) Date: Fri, 27 Jun 2025 16:00:46 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Thu, 26 Jun 2025 11:46:12 GMT, Emanuel Peter wrote: > I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review. @eme64 Second review with the latest changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3013524947 From roland at openjdk.org Fri Jun 27 16:29:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Jun 2025 16:29:45 GMT Subject: [jdk25] RFR: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 08:21:33 GMT, Tobias Hartmann wrote: >> Hi all, >> >> This pull request contains a backport of commit [c11f36e6](https://github.com/openjdk/jdk/commit/c11f36e6200b6c39fd59530f28e9318c4153db49) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Roland Westrelin on 20 Jun 2025 and was reviewed by Roberto Casta?eda Lozano and Emanuel Peter. >> >> Thanks! > > Looks good to me. @TobiHartmann @eme64 thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25929#issuecomment-3013637864 From roland at openjdk.org Fri Jun 27 16:29:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 27 Jun 2025 16:29:46 GMT Subject: [jdk25] Integrated: 8356708: C2: loop strip mining expansion doesn't take sunk stores into account In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 07:19:27 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [c11f36e6](https://github.com/openjdk/jdk/commit/c11f36e6200b6c39fd59530f28e9318c4153db49) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 20 Jun 2025 and was reviewed by Roberto Casta?eda Lozano and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: eaaaae5b Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/eaaaae5be95c16049f2cf8b50fbc55784f00fdda Stats: 282 lines in 3 files changed: 255 ins; 1 del; 26 mod 8356708: C2: loop strip mining expansion doesn't take sunk stores into account Reviewed-by: thartmann, epeter Backport-of: c11f36e6200b6c39fd59530f28e9318c4153db49 ------------- PR: https://git.openjdk.org/jdk/pull/25929 From mhaessig at openjdk.org Fri Jun 27 18:28:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 27 Jun 2025 18:28:34 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 Message-ID: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. This PR changes the test to reflect the changes introduced in #25872. Testing: - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) - [ ] tier1,tier2 plus Oracle internal testing ------------- Commit messages: - Fix test Changes: https://git.openjdk.org/jdk/pull/26024/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360641 Stats: 39 lines in 1 file changed: 26 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/26024.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26024/head:pull/26024 PR: https://git.openjdk.org/jdk/pull/26024 From kvn at openjdk.org Fri Jun 27 19:45:39 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Jun 2025 19:45:39 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Fri, 27 Jun 2025 18:09:23 GMT, Manuel H?ssig wrote: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [ ] tier1,tier2 plus Oracle internal testing test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 30: > 28: * @requires vm.flagless > 29: * @requires vm.bits == "64" > 30: * @requires vm.debug Why you limited it to debug VM? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2172755127 From dholmes at openjdk.org Sat Jun 28 04:49:48 2025 From: dholmes at openjdk.org (David Holmes) Date: Sat, 28 Jun 2025 04:49:48 GMT Subject: RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v4] In-Reply-To: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> References: <1tCoafZVhWpgCaR-CRDRhRMuJlIAMbujEsUv7dbptQ8=.1b987ec6-942e-4992-8485-f1f23a0159ab@github.com> Message-ID: <1TH-Fqu1PA7CGeVMxSjLBPEnQhs7BdFqJObm-C-rk8M=.b5bd0f46-a556-4491-b580-9339d512720f@github.com> On Wed, 25 Jun 2025 13:56:09 GMT, Evgeny Astigeevich wrote: >> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better: >> - https://github.com/mysql/mysql-server/pull/611 >> - https://github.com/facebook/folly/pull/2390 >> >> There are discussions regarding using it for spin pauses: >> - https://github.com/gperftools/gperftools/pull/1594 >> - https://github.com/haproxy/haproxy/pull/2974 >> >> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension >> >> CPUs supporting it: >> - Apple M2+ >> - Neoverse-N2 >> - Neoverse-V2 >> >> Tests: >> - Gtests passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed. >> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed. >> >> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2): >> >> >> Benchmark Mode Cnt Score Error Units Diff >> ThreadOnSpinWait.ISB avgt 15 11.875 ? 0.129 ns/op >> ThreadOnSpinWait.SB avgt 15 6.930 ? 0.054 ns/op -42% >> >> Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff >> ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ? 10.160 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ? 4.036 ms/op -46% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ? 7.228 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ? 1.292 ms/op -31% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ? 44.925 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ? 13.814 ms/op -62% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ? 5.353 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ? 3.436 ms/op -84% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ? 9.272 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ? 8.561 ms/op -33% >> ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ? 7.543 ms/op >> ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ? 9.519 ms/op -30% > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Restore check of non-zero exit value; make spinWaitInstCount always provided in test cmd options The modified test is now failing in our CI: java.lang.RuntimeException: Missing compiler output for Thread.onSpinWait intrinsic will file a bug when able to (might not be till Monday). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3014953926 From jkarthikeyan at openjdk.org Mon Jun 30 03:55:41 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 03:55:41 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v7] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Add case for bytes and prevent vector ops from truncating ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/c219e38e..3701c534 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=05-06 Stats: 13 lines in 1 file changed: 12 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Mon Jun 30 03:55:41 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 03:55:41 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 05:53:44 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the test results! I've added these nodes to the non-truncating list, as well as the other reduction nodes that showed up when running the vector tests. > > @jaskarth the tests look better now. I still saw this failure: > > `jdk/incubator/vector/Byte64VectorTests.java` > > Flags: `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` or `-XX:UseAVX=0` or `-XX:UseAVX=2` ... probably no flags are actually required. > > `# assert(false) failed: Unexpected node in SuperWord truncation: ExtractB` @eme64 Thanks for running it again! I've pushed a fix for the `ExtractB` assert, and a proactive fix marking any nodes with `TypeVect` as their base type as non-truncating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-3017685880 From jkarthikeyan at openjdk.org Mon Jun 30 03:55:42 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 03:55:42 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: <9nzGK8I9TMCTrAkvPln1xoEwpA2Ex_Sjr886PZgLF6w=.c480e2fd-0146-4821-b0c7-ded85360fea0@github.com> On Thu, 26 Jun 2025 06:00:24 GMT, Emanuel Peter wrote: >> I ended up encountering `ExtractS` when running `Short128VectorTests.java`, but I thought I should add `ExtractC` to the list as well. Unfortunately I don't think we don't have tests for that one yet because `char` doesn't vectorize yet. > > Does it even make sense to vectorize Extract? Is there any case where this actually succeeds? It would be worth filing a follow-up RFE if we don't want to spend the time on this now. I don't think there's a case where vectorizing Extract can succeed since it's already a vector op (extract scalar out of vector), so this case would only be needed to prevent assert failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2174161341 From jkarthikeyan at openjdk.org Mon Jun 30 03:55:42 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 03:55:42 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: Message-ID: On Thu, 26 Jun 2025 05:50:07 GMT, Emanuel Peter wrote: >> It looks like this was because the test has loops that call the vector API, so when the loop gets unrolled it seems that reduction nodes that were created can show up here: https://github.com/openjdk/jdk/blob/1ca008fd02496dc33e2707c102560cae1690fba5/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L3710-L3721 > > Should we just handle all vector nodes together, and prevent that any of them are truncated? I think that's a good idea. I think rather than adding nodes specifically by opcode it might be better to mark any nodes with a base type of TypeVect as non-truncating. I've pushed a commit that should handle this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2174159984 From jbhateja at openjdk.org Mon Jun 30 05:34:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 30 Jun 2025 05:34:43 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v2] In-Reply-To: References: Message-ID: On Wed, 25 Jun 2025 15:43:14 GMT, Manuel H?ssig wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments > > Thank you for implementing these new instructions! I had a look at your changes and have a few minor suggestions and questions. I am quite new to this part of the codebase, so feel free to disagree if I am way off base. > > How did you test these changes? > > Also, if you merge the current master branch, the Windows build failures in the Github Actions will be fixed. Hi @mhaessig , your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25914#issuecomment-3017856854 From thartmann at openjdk.org Mon Jun 30 05:44:45 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Jun 2025 05:44:45 GMT Subject: Integrated: 8361032: Problem list TestOnSpinWaitAArch64 until JDK-8360936 is fixed Message-ID: The test fails consistently in our testing, let's problem list it. Thanks, Tobias ------------- Commit messages: - 8361032: Problem list TestOnSpinWaitAArch64 until JDK-8360936 is fixed Changes: https://git.openjdk.org/jdk/pull/26036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361032 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26036/head:pull/26036 PR: https://git.openjdk.org/jdk/pull/26036 From alanb at openjdk.org Mon Jun 30 05:44:45 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 30 Jun 2025 05:44:45 GMT Subject: Integrated: 8361032: Problem list TestOnSpinWaitAArch64 until JDK-8360936 is fixed In-Reply-To: References: Message-ID: <3tVN_5fI1EBM5fPnbIhpFiMzmeRKADrtNp3D23NH5bc=.3271d0c1-258e-45bd-9bc2-cfa0d185b0f3@github.com> On Mon, 30 Jun 2025 05:15:27 GMT, Tobias Hartmann wrote: > The test fails consistently in our testing, let's problem list it. > > Thanks, > Tobias Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26036#pullrequestreview-2969954002 From thartmann at openjdk.org Mon Jun 30 05:44:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Jun 2025 05:44:46 GMT Subject: Integrated: 8361032: Problem list TestOnSpinWaitAArch64 until JDK-8360936 is fixed In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 05:15:27 GMT, Tobias Hartmann wrote: > The test fails consistently in our testing, let's problem list it. > > Thanks, > Tobias Thanks for the quick review Alan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/26036#issuecomment-3017872539 From thartmann at openjdk.org Mon Jun 30 05:44:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Jun 2025 05:44:46 GMT Subject: Integrated: 8361032: Problem list TestOnSpinWaitAArch64 until JDK-8360936 is fixed In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 05:15:27 GMT, Tobias Hartmann wrote: > The test fails consistently in our testing, let's problem list it. > > Thanks, > Tobias This pull request has now been integrated. Changeset: c2d76f98 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8361032: Problem list TestOnSpinWaitAArch64 until JDK-8360936 is fixed Reviewed-by: alanb ------------- PR: https://git.openjdk.org/jdk/pull/26036 From epeter at openjdk.org Mon Jun 30 05:54:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 30 Jun 2025 05:54:42 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Fri, 27 Jun 2025 15:53:06 GMT, Mohamed Issa wrote: >> I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review. > >> I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review. > > @eme64 Second review with the latest changes? @missa-prime The patch still looks good, though I ran testing again because of the new changes. Should complete in about 24h. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3017889529 From epeter at openjdk.org Mon Jun 30 05:58:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 30 Jun 2025 05:58:48 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: <9nzGK8I9TMCTrAkvPln1xoEwpA2Ex_Sjr886PZgLF6w=.c480e2fd-0146-4821-b0c7-ded85360fea0@github.com> References: <9nzGK8I9TMCTrAkvPln1xoEwpA2Ex_Sjr886PZgLF6w=.c480e2fd-0146-4821-b0c7-ded85360fea0@github.com> Message-ID: On Mon, 30 Jun 2025 03:51:32 GMT, Jasmine Karthikeyan wrote: >> Does it even make sense to vectorize Extract? Is there any case where this actually succeeds? It would be worth filing a follow-up RFE if we don't want to spend the time on this now. > > I don't think there's a case where vectorizing Extract can succeed since it's already a vector op (extract scalar out of vector), so this case would only be needed to prevent assert failures. I see. But then why not return `false`, just like for the other vector operations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2174280490 From epeter at openjdk.org Mon Jun 30 05:58:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 30 Jun 2025 05:58:50 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v7] In-Reply-To: References: Message-ID: <1dG5B8Kxa52lGzda1yNH3kY1M1-MXtNp_bJrjQyG9rs=.7edb3a73-ef05-48b5-b202-78077b001970@github.com> On Mon, 30 Jun 2025 03:55:41 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Add case for bytes and prevent vector ops from truncating src/hotspot/share/opto/superword.cpp line 2612: > 2610: case Op_XorReductionV: > 2611: case Op_MaxReductionV: > 2612: case Op_MinReductionV: Can we now remove those, since you are already handling all vectors explicitly above? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2174277639 From dfenacci at openjdk.org Mon Jun 30 06:53:44 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 30 Jun 2025 06:53:44 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Fri, 27 Jun 2025 18:09:23 GMT, Manuel H?ssig wrote: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [ ] tier1,tier2 plus Oracle internal testing Thanks a lot for fixing this @mhaessig! I left a couple of inline comments and just noticed you might want to add the copyright note as well ? test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 138: > 136: // Non-tiered modes > 137: int c1OnlyCount = heuristicCount(cpus, Compilation.C1Only); > 138: pass(c1OnlyCount, opt, "-XX:TieredStopAtLevel=1", "-XX:NonNMethodCodeHeapSize="+NonNMethodCodeHeapSize, "-XX:CodeCacheMinimumUseSpace="+CodeCacheMinimumUseSpace); Very minor style thing: maybe we can put whitespaces around the `+` signs. test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 164: > 162: > 163: // Buffer sizes for caclulating the maximum number of compiler threads. > 164: static final int NonNMethodCodeHeapSize = 5 * 1024 * 1024; Is the `NonNMethodCodeHeapSize` value chosen "on purpose"? ------------- PR Review: https://git.openjdk.org/jdk/pull/26024#pullrequestreview-2970074252 PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174335731 PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174345082 From mhaessig at openjdk.org Mon Jun 30 08:07:41 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:07:41 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v4] In-Reply-To: References: Message-ID: On Fri, 27 Jun 2025 08:54:23 GMT, Jatin Bhateja wrote: >> Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. >> >> **The following pseudo-code describes the existing algorithm for min/max[FD]:** >> >> Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. >> >> btmp = (b < +0.0) ? a : b >> atmp = (b < +0.0) ? b : a >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. >> >> btmp = (b < +0.0) ? b : a >> atmp = (b < +0.0) ? a : b >> Tmp = Max_Float(atmp , btmp) >> Res = (atmp == NaN) ? atmp : Tmp >> >> Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. >> >> Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions Hi @jatin-bhateja, thank you for addressing my feedback. It looks good to me know. src/hotspot/cpu/x86/x86_64.ad line 4529: > 4527: predicate(VM_Version::supports_avx10_2()); > 4528: match(Set dst (MinF a b)); > 4529: format %{ "maxF $dst, $a, $b" %} Suggestion: format %{ "minF $dst, $a, $b" %} This should the format match the instruction if I understand this correctly. src/hotspot/cpu/x86/x86_64.ad line 4565: > 4563: predicate(VM_Version::supports_avx10_2()); > 4564: match(Set dst (MinD a b)); > 4565: format %{ "maxD $dst, $a, $b" %} Suggestion: format %{ "minD $dst, $a, $b" %} ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/25914#pullrequestreview-2970276248 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2174469497 PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2174467431 From mdoerr at openjdk.org Mon Jun 30 08:09:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Jun 2025 08:09:39 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Fri, 27 Jun 2025 18:09:23 GMT, Manuel H?ssig wrote: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [ ] tier1,tier2 plus Oracle internal testing Thank you for fixing it! I have tested it successfully. Looks good besides what other reviewers have already commented. test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 186: > 184: }; > 185: return Math.max(Math.min(count, max_count), min_count); > 186: Maybe remove the extra newline? ------------- PR Review: https://git.openjdk.org/jdk/pull/26024#pullrequestreview-2970286688 PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174473933 From mhaessig at openjdk.org Mon Jun 30 08:14:39 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:14:39 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: <4a-adIXpIblw3j2mp4t6k1XyEhllAkgDLhl-oifRNK0=.37436a51-0044-4c3f-ad45-4253669722aa@github.com> On Mon, 30 Jun 2025 06:47:41 GMT, Damon Fenacci wrote: >> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. >> >> This PR changes the test to reflect the changes introduced in #25872. >> >> Testing: >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) >> - [ ] tier1,tier2 plus Oracle internal testing > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 164: > >> 162: >> 163: // Buffer sizes for caclulating the maximum number of compiler threads. >> 164: static final int NonNMethodCodeHeapSize = 5 * 1024 * 1024; > > Is the `NonNMethodCodeHeapSize` value chosen "on purpose"? Not really. We could choose it to be larger. That would just make the cutoff be at a higher CPU count. With 5 MiB, this will also perform a cutoff with less CPUs and should test both code paths on more machines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174487239 From jbhateja at openjdk.org Mon Jun 30 08:38:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 30 Jun 2025 08:38:27 GMT Subject: RFR: 8360116: Add support for AVX10 floating point minmax instruction [v5] In-Reply-To: References: Message-ID: > Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection. > > **The following pseudo-code describes the existing algorithm for min/max[FD]:** > > Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand. > > btmp = (b < +0.0) ? a : b > atmp = (b < +0.0) ? b : a > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0. > > btmp = (b < +0.0) ? b : a > atmp = (b < +0.0) ? a : b > Tmp = Max_Float(atmp , btmp) > Res = (atmp == NaN) ? atmp : Tmp > > Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling. > > Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/cpu/x86/x86_64.ad Co-authored-by: Manuel H?ssig - Update src/hotspot/cpu/x86/x86_64.ad Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25914/files - new: https://git.openjdk.org/jdk/pull/25914/files/89697983..5597b615 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25914&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25914/head:pull/25914 PR: https://git.openjdk.org/jdk/pull/25914 From mhaessig at openjdk.org Mon Jun 30 08:40:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:40:56 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v2] In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [x] tier1,tier2 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Make it work for product and address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26024/files - new: https://git.openjdk.org/jdk/pull/26024/files/53530bda..ab1c7d9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=00-01 Stats: 33 lines in 1 file changed: 12 ins; 5 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/26024.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26024/head:pull/26024 PR: https://git.openjdk.org/jdk/pull/26024 From mhaessig at openjdk.org Mon Jun 30 08:40:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:40:56 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Fri, 27 Jun 2025 18:09:23 GMT, Manuel H?ssig wrote: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [x] tier1,tier2 plus Oracle internal testing @vnkozlov, @dafedafe, and @TheRealMDoerr, thank you for having a look! I addressed all your comments in ab1c7d9. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26024#issuecomment-3018287043 From mhaessig at openjdk.org Mon Jun 30 08:40:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:40:56 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v2] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Fri, 27 Jun 2025 19:42:49 GMT, Vladimir Kozlov wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Make it work for product and address review comments > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 30: > >> 28: * @requires vm.flagless >> 29: * @requires vm.bits == "64" >> 30: * @requires vm.debug > > Why you limited it to debug VM? Originally, I put this restriction there because `-XX:CodeCacheMinimumUseSpace` is a debug flag. But your comment prompted me to check if that flag has an ergonomic, which it does not. Hence, I made the test work for both debug and product in ab1c7d9. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174534585 From mhaessig at openjdk.org Mon Jun 30 08:40:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:40:56 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v2] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 06:41:00 GMT, Damon Fenacci wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Make it work for product and address review comments > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 138: > >> 136: // Non-tiered modes >> 137: int c1OnlyCount = heuristicCount(cpus, Compilation.C1Only); >> 138: pass(c1OnlyCount, opt, "-XX:TieredStopAtLevel=1", "-XX:NonNMethodCodeHeapSize="+NonNMethodCodeHeapSize, "-XX:CodeCacheMinimumUseSpace="+CodeCacheMinimumUseSpace); > > Very minor style thing: maybe we can put whitespaces around the `+` signs. Fixed in ab1c7d9 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174530732 From mhaessig at openjdk.org Mon Jun 30 08:40:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:40:57 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v2] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 08:04:55 GMT, Martin Doerr wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Make it work for product and address review comments > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 186: > >> 184: }; >> 185: return Math.max(Math.min(count, max_count), min_count); >> 186: > > Maybe remove the extra newline? Fixed in ab1c7d9 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174530267 From bmaillard at openjdk.org Mon Jun 30 08:42:57 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 30 Jun 2025 08:42:57 GMT Subject: RFR: 8359602: VerifyIterativeGVN fails with assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it Message-ID: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. ### Context During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) ### Detailed Analysis In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which results in a type refinement: the range gets restricted to `int:-13957..-1191`. ```c++ // Pull from worklist; compute new value; push changes out. // This loop is the meat of CCP. while (worklist.size() != 0) { Node* n = fetch_next_node(worklist); DEBUG_ONLY(worklist_verify.push(n);) if (n->is_SafePoint()) { // Make sure safepoints are processed by PhaseCCP::transform even if they are // not reachable from the bottom. Otherwise, infinite loops would be removed. _root_and_safepoints.push(n); } const Type* new_type = n->Value(this); if (new_type != type(n)) { DEBUG_ONLY(verify_type(n, new_type, type(n));) dump_type_and_node(n, new_type); set_type(n, new_type); push_child_nodes_to_worklist(worklist, n); } if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { // Keep track of Type nodes to kill CFG paths that use Type // nodes that become dead. _maybe_top_type_nodes.push(n); } } DEBUG_ONLY(verify_analyze(worklist_verify);) At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: - `int` for node `591` (`ModINode`) - `int:-13957..-1191` for node `138` (`PhiNode`) If we call `find_node(138)->bottom_type()`, we get: - `int` for both nodes There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` is not able to refine the type in this case. Then, in `PhaseCCP::transform_once`, nodes get added to the worklist based on the progress made in `PhaseCCP::analyze`. The type from the side table is compared to the one returned by `n->bottom_type()`: - For the `PhiNode`, this is the `_type` field for `TypeNode` - For the `ModINode`, this is always `TypeInt::INT` If these types are not equal for a given node, the node gets added to the IGVN worklist for later processing. ```c++ // If x is a TypeNode, capture any more-precise type permanently into Node if (t != n->bottom_type()) { hash_delete(n); // changing bottom type may force a rehash n->raise_bottom_type(t); _worklist.push(n); // n re-enters the hash table via the worklist } Because `Value` was able to refine the type for the `PhiNode` but not for the `ModINode`, only the `PhiNode` gets added to the IGVN worklist. The `n->raise_bottom_type(t)` call also has the effect that the `_type` field is set for the `PhiNode`. At the end of CCP, the `PhiNode` is in the worklist, but the `ModINode` is not. An IGVN phase takes place after CCP. In `PhaseIterGVN::optimize`, we pop nodes from the worklist and apply ideal, value, and identity optimizations. If any of these lead to progress on a given node, the users of the node get added to the worklist. For the `PhiNode`, however, no progress can be made at this phase (the type was already refined to the maximum during CCP). As a result, the `ModINode` never makes it to the worklist and `ModINode::Value` is never called. Before returning from `PhaseIterGVN::optimize`, we run the verifications with `PhaseIterGVN::verify_optimize`. There, we call `Value` on the `ModINode` to attempt to optimize further. In `ModINode::Ideal`, we check the type of the divisor and get rid of the control input if we can prove that the divisor can never be `0`: ```c++ // Check for useless control input // Check for excluding mod-zero case if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) { set_req(0, nullptr); // Yank control input return this; } Because the type range of the `PhiNode` was refined during CCP to `int:-13957..-1191`, the condition evaluates to `true`, and the node gets modified with `set_req(0, nullptr)`. Since an optimization is not expected to take place during verification, the assert gets triggered. In summary, this optimization is missed because it is an `Ideal` optimization that depends on the type of the input, and because that type changes during CCP, right before the IGVN phase. Since there is no direct notification from the input to its users in CCP when the type changes, the optimization never gets triggered. ### Proposed Fix There currently exists no mechanism to propagate a type update in CCP to an ideal optimization in IGVN. In many cases, a type refinement on the input will propagate to the type of the users, but it is not the case when no value optimization is available for the user. This can happen with a `ModINode`, but could probably also happen in other scenarios. The proposed fix is simply to notify users by adding them to the IGVN worklist when a type refinement happens in CCP. We simply add a call to `add_users_to_worklist(n)` in `PhaseCCP::transform_once`: ```c++ // If x is a TypeNode, capture any more-precise type permanently into Node if (t != n->bottom_type()) { hash_delete(n); // changing bottom type may force a rehash n->raise_bottom_type(t); _worklist.push(n); // n re-enters the hash table via the worklist add_users_to_worklist(n); // if ideal or identity optimizations depend on the input type, users need to be notified } ### Testing - [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8359602) - [x] tier1-3, plus some internal testing - [x] Added (modified) test from the fuzzer with `-XX:VerifyIterativeGVN=1110` to make sure the optimization is not missed anymore - [x] Manual testing with `-XX:+CITime` flag to make sure compilation time does not increase Thank you for reviewing! ------------- Commit messages: - 8359602: add comment - 8359602: add test summary and comments - 8359602: tag requires vm.debug == true - 8359602: Add test from fuzzer - 8359602: Add users to IGVN worklist when type is refined in CCP Changes: https://git.openjdk.org/jdk/pull/26017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359602 Stats: 62 lines in 2 files changed: 62 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26017/head:pull/26017 PR: https://git.openjdk.org/jdk/pull/26017 From mdoerr at openjdk.org Mon Jun 30 08:46:40 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Jun 2025 08:46:40 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v2] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 08:34:26 GMT, Manuel H?ssig wrote: >> test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 186: >> >>> 184: }; >>> 185: return Math.max(Math.min(count, max_count), min_count); >>> 186: >> >> Maybe remove the extra newline? > > Fixed in ab1c7d9 That was another newline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174547966 From mhaessig at openjdk.org Mon Jun 30 08:58:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:58:07 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v3] In-Reply-To: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: > After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. > > This PR changes the test to reflect the changes introduced in #25872. > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) > - [x] tier1,tier2 plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - Remove superfluous newline - Add copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26024/files - new: https://git.openjdk.org/jdk/pull/26024/files/ab1c7d9a..8beb5898 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26024&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26024.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26024/head:pull/26024 PR: https://git.openjdk.org/jdk/pull/26024 From mhaessig at openjdk.org Mon Jun 30 08:58:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 08:58:07 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v3] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 08:43:44 GMT, Martin Doerr wrote: >> Fixed in ab1c7d9 > > That was another newline. 8beb589 should do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2174569366 From aph at openjdk.org Mon Jun 30 09:07:40 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 30 Jun 2025 09:07:40 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. >> >> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values. >> 2. If these special values are found, return immediately with minimal modifications to the result register. >> >> The commands to run all relevant micro-benchmarks are posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_. >> >> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) | >> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: | >> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 | >> | [0] | 344990 | 627561 | +81.91 | >> | [-0] | 291983 | 629941 | +115.75 | >> | [INF] | 382685 | 542211 | +4... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. Please add the performance for arguments in the normal range to this list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3018370027 From mchevalier at openjdk.org Mon Jun 30 09:30:40 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 30 Jun 2025 09:30:40 GMT Subject: RFR: 8359602: VerifyIterativeGVN fails with assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Fri, 27 Jun 2025 10:59:57 GMT, Beno?t Maillard wrote: > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... Wow, nice explanation, I like it! And solution seems good. We do enqueue users for IGVN when value is refined in IGVN, it's not shocking it's also needed to enqueue users for IGVN when the type improvement doesn't come from IGVN: we need to idealize the users because the abstract value improved, not because it was improved during IGVN in particular. A little question tho. test/hotspot/jtreg/compiler/c2/TestPropagateTypeRefinementToIGVN.java line 30: > 28: * possible to prove that the divisor can never be 0. > 29: * VerifyIterativeGVN checks that this optimization was applied > 30: * @requires vm.debug == true Is that really required? Can't you get away with a `-XX:+IgnoreUnrecognizedVMOptions` if the problem is unrecognized VM options? Would be nice to know if it works in product too. ------------- PR Review: https://git.openjdk.org/jdk/pull/26017#pullrequestreview-2970532176 PR Review Comment: https://git.openjdk.org/jdk/pull/26017#discussion_r2174630495 From mdoerr at openjdk.org Mon Jun 30 10:08:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Jun 2025 10:08:39 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v3] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 08:58:07 GMT, Manuel H?ssig wrote: >> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. >> >> This PR changes the test to reflect the changes introduced in #25872. >> >> Testing: >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) >> - [x] tier1,tier2 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous newline > - Add copyright Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26024#pullrequestreview-2970654855 From tkurashige at openjdk.org Mon Jun 30 11:11:42 2025 From: tkurashige at openjdk.org (Taizo Kurashige) Date: Mon, 30 Jun 2025 11:11:42 GMT Subject: RFR: 8359120: Improve warning message when fail to load hsdis library [v2] In-Reply-To: <7UkSbnceEz4PY3UDwyR9iOseuvS4sD8FBBGl96mG_lk=.e94b4126-9df5-406b-a3f3-b21439d848e6@github.com> References: <-i6UPk-bhy9RnnCus_JbJ1nQ63nMX9djubON9WBbHQ8=.a2305566-563e-4171-b526-bcd645de51a3@github.com> <7UkSbnceEz4PY3UDwyR9iOseuvS4sD8FBBGl96mG_lk=.e94b4126-9df5-406b-a3f3-b21439d848e6@github.com> Message-ID: On Fri, 27 Jun 2025 10:17:26 GMT, Tobias Hartmann wrote: > Don't we already have such error reporting code in `Disassembler::load_library`? Yes, that reporting code already exists, but since `nullptr` is passed at [src/hotspot/share/compiler/disassembler.hpp#L66](https://github.com/openjdk/jdk/blob/c2d76f9844aadf77a0b213a9169a7c5c8c8f1ffb/src/hotspot/share/compiler/disassembler.hpp#L66), that reporting doesn't actually work. I thought about using this reporting code, but decided that it should not be used for the following reasons: - If -XX:+PrintAssembly is not specified and the message `PrintAssembly defaults to abstract disassembly` is printed, the user is confused "Why do I get this message when -XX:+PrintAssembly is not specified?." - Since the message is not always output immediately before [MachCode], but may be output slightly away, it may be difficult to associate [MachCode] with the message (It may be difficult for the user to understand why the warning was issued) - Even if a plurality of [MachCode] are output, the message is output only once, so that it may become difficult for the user to associate [MachCode] with the message. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25726#issuecomment-3018739931 From bmaillard at openjdk.org Mon Jun 30 11:56:41 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 30 Jun 2025 11:56:41 GMT Subject: RFR: 8359602: VerifyIterativeGVN fails with assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Mon, 30 Jun 2025 09:25:35 GMT, Marc Chevalier wrote: >> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. >> >> ### Context >> During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. >> >> In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). >> >> ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) >> >> ### Detailed Analysis >> >> In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which >> results in a type refinement: the range gets restricted to `int:-13957..-1191`. >> >> ```c++ >> // Pull from worklist; compute new value; push changes out. >> // This loop is the meat of CCP. >> while (worklist.size() != 0) { >> Node* n = fetch_next_node(worklist); >> DEBUG_ONLY(worklist_verify.push(n);) >> if (n->is_SafePoint()) { >> // Make sure safepoints are processed by PhaseCCP::transform even if they are >> // not reachable from the bottom. Otherwise, infinite loops would be removed. >> _root_and_safepoints.push(n); >> } >> const Type* new_type = n->Value(this); >> if (new_type != type(n)) { >> DEBUG_ONLY(verify_type(n, new_type, type(n));) >> dump_type_and_node(n, new_type); >> set_type(n, new_type); >> push_child_nodes_to_worklist(worklist, n); >> } >> if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { >> // Keep track of Type nodes to kill CFG paths that use Type >> // nodes that become dead. >> _maybe_top_type_nodes.push(n); >> } >> } >> DEBUG_ONLY(verify_analyze(worklist_verify);) >> >> >> At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: >> - `int` for node `591` (`ModINode`) >> - `int:-13957..-1191` for node `138` (`PhiNode`) >> >> If we call `find_node(138)->bottom_type()`, we get: >> - `int` for both nodes >> >> The... > > test/hotspot/jtreg/compiler/c2/TestPropagateTypeRefinementToIGVN.java line 30: > >> 28: * possible to prove that the divisor can never be 0. >> 29: * VerifyIterativeGVN checks that this optimization was applied >> 30: * @requires vm.debug == true > > Is that really required? Can't you get away with a `-XX:+IgnoreUnrecognizedVMOptions` if the problem is unrecognized VM options? Would be nice to know if it works in product too. The entire `verify_PhaseIterGVN()` call hides behind a `NOT_PRODUCT` macro, so unless I am missing something fundemental here I think we really need the debug mode. You can find the call [here](https://github.com/benoitmaillard/jdk/blob/2ac5e725e472578618c5ab47425c52843dc119bb/src/hotspot/share/opto/phaseX.cpp#L1069). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26017#discussion_r2174896663 From thartmann at openjdk.org Mon Jun 30 12:06:39 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Jun 2025 12:06:39 GMT Subject: RFR: 8359602: VerifyIterativeGVN fails with assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Fri, 27 Jun 2025 10:59:57 GMT, Beno?t Maillard wrote: > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... I agree with Marc, really nice write-up! The fix looks good to me. Did you check if this allows enabling any of the other disabled verifications from [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273)? Please change the title of this issue to something more descriptive. test/hotspot/jtreg/compiler/c2/TestPropagateTypeRefinementToIGVN.java line 35: > 33: * -XX:CompileCommand=compileonly,compiler.c2.TestPropagateTypeRefinementToIGVN::test > 34: * compiler.c2.TestPropagateTypeRefinementToIGVN > 35: * @run main compiler.c2.TestPropagateTypeRefinementToIGVN The test name is a bit too generic, maybe change to `TestModControlFoldedAfterCCP` or something. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26017#pullrequestreview-2970961669 PR Review Comment: https://git.openjdk.org/jdk/pull/26017#discussion_r2174907075 From bmaillard at openjdk.org Mon Jun 30 12:06:40 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 30 Jun 2025 12:06:40 GMT Subject: RFR: 8359602: VerifyIterativeGVN fails with assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it In-Reply-To: References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: On Mon, 30 Jun 2025 11:59:33 GMT, Tobias Hartmann wrote: >> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. >> >> ### Context >> During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. >> >> In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). >> >> ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) >> >> ### Detailed Analysis >> >> In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which >> results in a type refinement: the range gets restricted to `int:-13957..-1191`. >> >> ```c++ >> // Pull from worklist; compute new value; push changes out. >> // This loop is the meat of CCP. >> while (worklist.size() != 0) { >> Node* n = fetch_next_node(worklist); >> DEBUG_ONLY(worklist_verify.push(n);) >> if (n->is_SafePoint()) { >> // Make sure safepoints are processed by PhaseCCP::transform even if they are >> // not reachable from the bottom. Otherwise, infinite loops would be removed. >> _root_and_safepoints.push(n); >> } >> const Type* new_type = n->Value(this); >> if (new_type != type(n)) { >> DEBUG_ONLY(verify_type(n, new_type, type(n));) >> dump_type_and_node(n, new_type); >> set_type(n, new_type); >> push_child_nodes_to_worklist(worklist, n); >> } >> if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { >> // Keep track of Type nodes to kill CFG paths that use Type >> // nodes that become dead. >> _maybe_top_type_nodes.push(n); >> } >> } >> DEBUG_ONLY(verify_analyze(worklist_verify);) >> >> >> At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: >> - `int` for node `591` (`ModINode`) >> - `int:-13957..-1191` for node `138` (`PhiNode`) >> >> If we call `find_node(138)->bottom_type()`, we get: >> - `int` for both nodes >> >> The... > > test/hotspot/jtreg/compiler/c2/TestPropagateTypeRefinementToIGVN.java line 35: > >> 33: * -XX:CompileCommand=compileonly,compiler.c2.TestPropagateTypeRefinementToIGVN::test >> 34: * compiler.c2.TestPropagateTypeRefinementToIGVN >> 35: * @run main compiler.c2.TestPropagateTypeRefinementToIGVN > > The test name is a bit too generic, maybe change to `TestModControlFoldedAfterCCP` or something. Thank for the comment, that sounds good, always tricky to summarize an issue in 4-5 words :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26017#discussion_r2174916439 From bmaillard at openjdk.org Mon Jun 30 12:31:29 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 30 Jun 2025 12:31:29 GMT Subject: RFR: 8359602: VerifyIterativeGVN fails with assert(!VerifyHashTableKeys || _hash_lock == 0) failed: remove node from hash table before modifying it [v2] In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: <3cLLB7fms3S4WgqOVeb7D_ZDRFsJ_-ca3qfALlmzFeU=.1002ac91-1e35-4499-9d88-6d1f76c955d0@github.com> > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - 8359602: rename test - 8359602: remove requires.debug=true and add -XX:+IgnoreUnrecognizedVMOptions flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26017/files - new: https://git.openjdk.org/jdk/pull/26017/files/2ac5e725..e05eb749 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26017/head:pull/26017 PR: https://git.openjdk.org/jdk/pull/26017 From jkarthikeyan at openjdk.org Mon Jun 30 12:47:02 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 12:47:02 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v8] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Move extracts and check for reductions explicitly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/3701c534..f779b9dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=06-07 Stats: 17 lines in 1 file changed: 0 ins; 13 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Mon Jun 30 12:47:02 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 12:47:02 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v7] In-Reply-To: <1dG5B8Kxa52lGzda1yNH3kY1M1-MXtNp_bJrjQyG9rs=.7edb3a73-ef05-48b5-b202-78077b001970@github.com> References: <1dG5B8Kxa52lGzda1yNH3kY1M1-MXtNp_bJrjQyG9rs=.7edb3a73-ef05-48b5-b202-78077b001970@github.com> Message-ID: On Mon, 30 Jun 2025 05:53:53 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Add case for bytes and prevent vector ops from truncating > > src/hotspot/share/opto/superword.cpp line 2612: > >> 2610: case Op_XorReductionV: >> 2611: case Op_MaxReductionV: >> 2612: case Op_MinReductionV: > > Can we now remove those, since you are already handling all vectors explicitly above? I couldn't move it earlier since reduction nodes return a regular type instead of a vector type, but I've pushed a commit that moves it to an `isa_Reduction()` check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2174986411 From jkarthikeyan at openjdk.org Mon Jun 30 12:59:24 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 12:59:24 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v9] In-Reply-To: References: Message-ID: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Explicit nullptr checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25440/files - new: https://git.openjdk.org/jdk/pull/25440/files/f779b9dd..893bebd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From jkarthikeyan at openjdk.org Mon Jun 30 12:59:24 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 30 Jun 2025 12:59:24 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short [v6] In-Reply-To: References: <9nzGK8I9TMCTrAkvPln1xoEwpA2Ex_Sjr886PZgLF6w=.c480e2fd-0146-4821-b0c7-ded85360fea0@github.com> Message-ID: On Mon, 30 Jun 2025 05:56:19 GMT, Emanuel Peter wrote: >> I don't think there's a case where vectorizing Extract can succeed since it's already a vector op (extract scalar out of vector), so this case would only be needed to prevent assert failures. > > I see. But then why not return `false`, just like for the other vector operations? Whoops, I think that's my mistake! I put the nodes in the switch with the subword types since the base type was short/char, but I should have put it in the switch with the assert. It ended up not causing problems since the Extract nodes can't vectorize, but I've moved it to the assert switch in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2175010877 From shade at openjdk.org Mon Jun 30 13:00:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Jun 2025 13:00:59 GMT Subject: RFR: 8360783: CTW: Skip deoptimization between tiers [v2] In-Reply-To: References: Message-ID: <2-UJphsSUYCdbDsdZ-4h3V00FVVjkpftKoBYOCta25Q=.e2a22a7e-d39a-4e98-9f02-c13c9222c16f@github.com> On Fri, 27 Jun 2025 08:38:31 GMT, Aleksey Shipilev wrote: >> When profiling CTW runs, I noticed we spend a lot of time dealing with deoptimization. We do this excessively, deoptimizing before compilation on every tier. This is excessive: Hotspot honors compilation requests on subsequent levels without the need for explicit deoptimization. Not doing deopt between tiers greatly improves CTW performance. >> >> A taste of improvements, about 15% less CPU spent: >> >> >> $ time make test TEST=applications/ctw/modules >> >> # Current >> real 5m1.616s >> user 79m41.398s >> sys 14m39.607s >> >> # Patched >> real 3m55.411s >> user 69m19.227s >> sys 5m24.323s >> >> >> The compilation still works as expected, progressing through tiers 1..4: >> >> >> $ JAVA_OPTIONS="-XX:+PrintCompilation -XX:CICompilerCount=2" ./ctw.sh modules:jdk.compiler | tee out >> ... >> $ grep sun.tools.serialver.resources.serialver_de::getContents out >> 101783 55033 b 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101785 55036 b 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101786 55033 1 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101786 55038 b 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101787 55036 2 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101792 55040 b 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) >> 101797 55038 3 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: not used >> 101798 55040 4 sun.tools.serialver.resources.serialver_de::getContents (108 bytes) made not entrant: marked for deoptimization > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/Compiler.java > > Co-authored-by: Tobias Hartmann Thanks, I think I need a quick re-review after the amendment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26013#issuecomment-3019057304 From mablakatov at openjdk.org Mon Jun 30 13:25:09 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 13:25:09 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: Message-ID: > Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. > > Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. > > The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms > IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms > LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms > > > Fujitsu A64FX (SVE 512-bit): > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms > ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms > IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms > LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms > FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms > DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - cleanup: address nits, rename several symbols - cleanup: remove unreferenced definitions - Address review comments. - fixup: disable FP mul reduction auto-vectorization for all targets - fixup: add a tmp vReg to reduce_mul_integral_gt128b and reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified - cleanup: replace a complex lambda in the above methods with a loop - cleanup: rename symbols to follow the existing naming convention - cleanup: add asserts to SVE only instructions - split mul FP reduction instructions into strictly-ordered (default) and explicitly non strictly-ordered - remove redundant conditions in TestVectorFPReduction.java Benchmarks results: Neoverse-V1 (SVE 256-bit) | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|--------|-------| | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | - Merge branch 'master' into 8343689-rebase - fixup: don't modify the value in vsrc Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this change, the result of recursive folding is held in vtmp1. To be able to pass this intermediate result to reduce_mul_integral_le128b(), we would have to use another temporary FloatRegister, as vtmp1 would essentially act as vsrc. It's possible to get around this however: reduce_mul_integral_le128b() is modified so it's possible to pass matching vsrc and vtmp2 arguments. By doing this, we save ourselves a temporary register in rules that match to reduce_mul_integral_gt128b(). - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating - Use EXT instead of COMPACT to split a vector into two halves Benchmarks results: Neoverse-V1 (SVE 256-bit) Benchmark (size) Mode master PR Units ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms Fujitsu A64FX (SVE 512-bit) Benchmark (size) Mode master PR Units ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms - 8343689: AArch64: Optimize MulReduction implementation Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. Benchmarks results for an AArch64 CPU with support for SVE with 256-bit vector length: Benchmark (size) Mode Old New Units Byte256Vector.MULLanes 1024 thrpt 502.498 10222.717 ops/ms Double256Vector.MULLanes 1024 thrpt 172.116 3130.997 ops/ms Float256Vector.MULLanes 1024 thrpt 291.612 4164.138 ops/ms Int256Vector.MULLanes 1024 thrpt 362.276 3717.213 ops/ms Long256Vector.MULLanes 1024 thrpt 184.826 2054.345 ops/ms Short256Vector.MULLanes 1024 thrpt 379.231 5716.223 ops/ms Benchmarks results for an AArch64 CPU with support for SVE with 512-bit vector length: Benchmark (size) Mode Old New Units Byte512Vector.MULLanes 1024 thrpt 160.129 2630.600 ops/ms Double512Vector.MULLanes 1024 thrpt 51.229 1033.284 ops/ms Float512Vector.MULLanes 1024 thrpt 84.617 1658.400 ops/ms Int512Vector.MULLanes 1024 thrpt 109.419 1180.310 ops/ms Long512Vector.MULLanes 1024 thrpt 69.036 704.144 ops/ms Short512Vector.MULLanes 1024 thrpt 131.029 1629.632 ops/ms ------------- Changes: https://git.openjdk.org/jdk/pull/23181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23181&range=03 Stats: 499 lines in 9 files changed: 346 ins; 2 del; 151 mod Patch: https://git.openjdk.org/jdk/pull/23181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23181/head:pull/23181 PR: https://git.openjdk.org/jdk/pull/23181 From mablakatov at openjdk.org Mon Jun 30 13:25:10 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 13:25:10 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: Message-ID: <6NGndk2GSPYLD5JMsNOR07ok4YtGtDeHzXrum-0wwno=.1be60480-4b10-4786-83db-0cae4223c833@github.com> On Mon, 30 Jun 2025 13:22:34 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - cleanup: address nits, rename several symbols > - cleanup: remove unreferenced definitions > - Address review comments. > > - fixup: disable FP mul reduction auto-vectorization for all targets > - fixup: add a tmp vReg to reduce_mul_integral_gt128b and > reduce_non_strict_order_mul_fp_gt128bto keep vsrc unmodified > - cleanup: replace a complex lambda in the above methods with a loop > - cleanup: rename symbols to follow the existing naming convention > - cleanup: add asserts to SVE only instructions > - split mul FP reduction instructions into strictly-ordered (default) > and explicitly non strictly-ordered > - remove redundant conditions in TestVectorFPReduction.java > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > | Benchmark | Before | After | Units | Diff | > |---------------------------|----------|----------|--------|-------| > | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | > | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | > | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | > | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | > | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | > | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | > - Merge branch 'master' into 8343689-rebase > - fixup: don't modify the value in vsrc > > Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this > change, the result of recursive folding is held in vtmp1. To be able to > pass this intermediate result to reduce_mul_integral_le128b(), we would > have to use another temporary FloatRegister, as vtmp1 would essentially > act as vsrc. It's possible to get around this however: > reduce_mul_integral_le128b() is modified so it's possible to pass > matching vsrc and vtmp2 arguments. By doing this, we save ourselves a > temporary register in rules that match to reduce_mul_integral_gt128b(). > - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating > - Use EXT instead of COMPACT to split a vector into two halves > > Benchmarks results: > > Neoverse-V1 (SVE 256-bit) > > Benchmark (size) Mode master PR Units > ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms > Short... Thank you for a review! There are a couple more nits I've missed, I'll submit an update to resolve them shortly. ------------- PR Review: https://git.openjdk.org/jdk/pull/23181#pullrequestreview-2970941468 From mablakatov at openjdk.org Mon Jun 30 13:25:10 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 13:25:10 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 14:54:45 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: > > - fixup: don't modify the value in vsrc > > Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this > change, the result of recursive folding is held in vtmp1. To be able to > pass this intermediate result to reduce_mul_integral_le128b(), we would > have to use another temporary FloatRegister, as vtmp1 would essentially > act as vsrc. It's possible to get around this however: > reduce_mul_integral_le128b() is modified so it's possible to pass > matching vsrc and vtmp2 arguments. By doing this, we save ourselves a > temporary register in rules that match to reduce_mul_integral_gt128b(). > - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating This patch improves of mul reduction VectorAPIs on SVE targets with 256b or wider vectors. This comment also provides performance numbers for NEON / SVE 128b platforms that aren't expected to benefit from these implementations and for auto-vectorization benchmarks. ### Neoverse N1 (NEON)
Auto-vectorization | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|-------|------| | mulRedD | 739.699 | 740.884 | ns/op | ~ | | byteAddBig | 2670.248 | 2670.562 | ns/op | ~ | | byteAddSimple | 1639.796 | 1639.940 | ns/op | ~ | | byteMulBig | 2707.900 | 2708.063 | ns/op | ~ | | byteMulSimple | 2452.939 | 2452.906 | ns/op | ~ | | charAddBig | 2772.363 | 2772.269 | ns/op | ~ | | charAddSimple | 1639.867 | 1639.751 | ns/op | ~ | | charMulBig | 2796.533 | 2796.375 | ns/op | ~ | | charMulSimple | 2453.034 | 2453.004 | ns/op | ~ | | doubleAddBig | 2943.613 | 2936.897 | ns/op | ~ | | doubleAddSimple | 1635.031 | 1634.797 | ns/op | ~ | | doubleMulBig | 3001.937 | 3003.240 | ns/op | ~ | | doubleMulSimple | 2448.154 | 2448.117 | ns/op | ~ | | floatAddBig | 2963.086 | 2962.215 | ns/op | ~ | | floatAddSimple | 1634.987 | 1634.798 | ns/op | ~ | | floatMulBig | 3022.442 | 3021.356 | ns/op | ~ | | floatMulSimple | 2447.976 | 2448.091 | ns/op | ~ | | intAddBig | 832.346 | 832.382 | ns/op | ~ | | intAddSimple | 841.276 | 841.287 | ns/op | ~ | | intMulBig | 1245.155 | 1245.095 | ns/op | ~ | | intMulSimple | 1638.762 | 1638.826 | ns/op | ~ | | longAddBig | 4924.541 | 4924.328 | ns/op | ~ | | longAddSimple | 841.623 | 841.625 | ns/op | ~ | | longMulBig | 9848.954 | 9848.807 | ns/op | ~ | | longMulSimple | 3427.169 | 3427.279 | ns/op | ~ | | shortAddBig | 2670.027 | 2670.345 | ns/op | ~ | | shortAddSimple | 1639.869 | 1639.876 | ns/op | ~ | | shortMulBig | 2750.812 | 2750.562 | ns/op | ~ | | shortMulSimple | 2453.030 | 2452.937 | ns/op | ~ |
VectorAPI | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|--------|------| | ByteMaxVector.MULLanes | 3935.178 | 3935.776 | ops/ms | ~ | | DoubleMaxVector.MULLanes | 971.911 | 973.142 | ops/ms | ~ | | FloatMaxVector.MULLanes | 1196.405 | 1196.222 | ops/ms | ~ | | IntMaxVector.MULLanes | 1218.301 | 1218.224 | ops/ms | ~ | | LongMaxVector.MULLanes | 541.793 | 541.805 | ops/ms | ~ | | ShortMaxVector.MULLanes | 2332.916 | 2428.970 | ops/ms | 4% |
### Neoverse V1 (SVE 256-bit)
Auto-vectorization | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|-------|------| | mulRedD | 401.696 | 401.699 | ns/op | ~ | | byteAddBig | 2365.921 | 2365.726 | ns/op | ~ | | byteAddSimple | 1569.524 | 1583.595 | ns/op | ~ | | byteMulBig | 2368.362 | 2369.144 | ns/op | ~ | | byteMulSimple | 2357.183 | 2356.961 | ns/op | ~ | | charAddBig | 2262.944 | 2262.851 | ns/op | ~ | | charAddSimple | 1569.399 | 1568.549 | ns/op | ~ | | charMulBig | 2365.594 | 2365.540 | ns/op | ~ | | charMulSimple | 2353.000 | 2356.285 | ns/op | ~ | | doubleAddBig | 1640.613 | 1640.747 | ns/op | ~ | | doubleAddSimple | 1549.028 | 1549.056 | ns/op | ~ | | doubleMulBig | 2352.374 | 2365.366 | ns/op | ~ | | doubleMulSimple | 2321.318 | 2321.273 | ns/op | ~ | | floatAddBig | 1078.672 | 1078.641 | ns/op | ~ | | floatAddSimple | 1549.075 | 1549.028 | ns/op | ~ | | floatMulBig | 2351.251 | 2355.657 | ns/op | ~ | | floatMulSimple | 2321.234 | 2321.205 | ns/op | ~ | | intAddBig | 225.647 | 225.631 | ns/op | ~ | | intAddSimple | 789.430 | 789.409 | ns/op | ~ | | intMulBig | 785.971 | 403.520 | ns/op | -49% | | intMulSimple | 1569.131 | 1569.542 | ns/op | ~ | | longAddBig | 819.702 | 819.898 | ns/op | ~ | | longAddSimple | 789.597 | 789.573 | ns/op | ~ | | longMulBig | 2460.433 | 2465.883 | ns/op | ~ | | longMulSimple | 1560.933 | 1560.738 | ns/op | ~ | | shortAddBig | 2268.769 | 2268.879 | ns/op | ~ | | shortAddSimple | 1569.829 | 1577.502 | ns/op | ~ | | shortMulBig | 2368.849 | 2369.381 | ns/op | ~ | | shortMulSimple | 2353.986 | 2353.620 | ns/op | ~ |
#### VectorAPI | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|--------|-------| | ByteMaxVector.MULLanes | 619.156 | 9884.578 | ops/ms | 1496% | | DoubleMaxVector.MULLanes | 184.693 | 2712.051 | ops/ms | 1368% | | FloatMaxVector.MULLanes | 277.818 | 3388.038 | ops/ms | 1119% | | IntMaxVector.MULLanes | 371.225 | 4765.434 | ops/ms | 1183% | | LongMaxVector.MULLanes | 205.149 | 2672.975 | ops/ms | 1203% | | ShortMaxVector.MULLanes | 472.804 | 5122.917 | ops/ms | 984% | ### Neoverse V2 (SVE 128-bit)
Auto-vectorization | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|-------|------| | mulRedD | 326.590 | 326.367 | ns/op | ~ | | byteAddBig | 1889.745 | 1894.973 | ns/op | ~ | | byteAddSimple | 1251.112 | 1255.026 | ns/op | ~ | | byteMulBig | 1891.615 | 1896.814 | ns/op | ~ | | byteMulSimple | 1871.912 | 1873.334 | ns/op | ~ | | charAddBig | 1892.921 | 1894.729 | ns/op | ~ | | charAddSimple | 1260.088 | 1260.200 | ns/op | ~ | | charMulBig | 1895.881 | 1892.268 | ns/op | ~ | | charMulSimple | 1871.443 | 1877.403 | ns/op | ~ | | doubleAddBig | 1325.652 | 1323.546 | ns/op | ~ | | doubleAddSimple | 1229.101 | 1232.291 | ns/op | ~ | | doubleMulBig | 1872.655 | 1873.624 | ns/op | ~ | | doubleMulSimple | 1843.787 | 1842.049 | ns/op | ~ | | floatAddBig | 1093.144 | 1093.687 | ns/op | ~ | | floatAddSimple | 1229.396 | 1229.058 | ns/op | ~ | | floatMulBig | 1862.449 | 1873.624 | ns/op | ~ | | floatMulSimple | 1841.839 | 1846.539 | ns/op | ~ | | intAddBig | 316.076 | 316.111 | ns/op | ~ | | intAddSimple | 629.235 | 630.857 | ns/op | ~ | | intMulBig | 615.185 | 616.652 | ns/op | ~ | | intMulSimple | 1258.883 | 1262.365 | ns/op | ~ | | longAddBig | 1145.601 | 1146.965 | ns/op | ~ | | longAddSimple | 633.978 | 634.034 | ns/op | ~ | | longMulBig | 1834.331 | 1854.264 | ns/op | ~ | | longMulSimple | 1264.152 | 1261.659 | ns/op | ~ | | shortAddBig | 1889.645 | 1890.173 | ns/op | ~ | | shortAddSimple | 1251.094 | 1250.808 | ns/op | ~ | | shortMulBig | 1893.699 | 1895.171 | ns/op | ~ | | shortMulSimple | 1871.791 | 1876.445 | ns/op | ~ |
VectorAPI | Benchmark | Before | After | Units | Diff | |---------------------------|----------|----------|--------|------| | ByteMaxVector.MULLanes | 7210.809 | 7229.774 | ops/ms | ~ | | DoubleMaxVector.MULLanes | 1333.230 | 1330.399 | ops/ms | ~ | | FloatMaxVector.MULLanes | 1762.932 | 1767.859 | ops/ms | ~ | | IntMaxVector.MULLanes | 3690.901 | 3699.748 | ops/ms | ~ | | LongMaxVector.MULLanes | 1994.725 | 1991.539 | ops/ms | ~ | | ShortMaxVector.MULLanes | 4648.878 | 4669.074 | ops/ms | ~ |
------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3018988067 From mablakatov at openjdk.org Mon Jun 30 13:25:10 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 13:25:10 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 03:28:51 GMT, Xiaohong Gong wrote: >> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixup: don't modify the value in vsrc >> >> Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this >> change, the result of recursive folding is held in vtmp1. To be able to >> pass this intermediate result to reduce_mul_integral_le128b(), we would >> have to use another temporary FloatRegister, as vtmp1 would essentially >> act as vsrc. It's possible to get around this however: >> reduce_mul_integral_le128b() is modified so it's possible to pass >> matching vsrc and vtmp2 arguments. By doing this, we save ourselves a >> temporary register in rules that match to reduce_mul_integral_gt128b(). >> - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3012: > >> 3010: vReg tmp1, vReg tmp2) %{ >> 3011: predicate(Matcher::vector_length_in_bytes(n->in(2)) == 8 || >> 3012: Matcher::vector_length_in_bytes(n->in(2)) == 16); > > Suggestion: > > predicate(Matcher::vector_length_in_bytes(n->in(2)) <= 16); The patch doesn't add or modifies these lines. I'd prefer to leave the footprint as small as possible and leave it as is.. > ` sve_ext(vtmp1, vtmp2, vector_length_in_bytes / 2);` This doesn't look right. `vtmp1` is both an src and dest register operand for `EXT` but it contains an undefined value at the first iteration of the loop. I agree that the implementation of the loop is confusing. I'll rework and (hopefully) simplify it, thank you for pointing this out. Hi @XiaohongGong , I hope I've been able to simplify the implementation. It doesn't utilize a lambda anymore, but I had to peel the first iteration out of the loop anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2154422774 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2154415974 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174988523 From mablakatov at openjdk.org Mon Jun 30 13:25:11 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 13:25:11 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 03:17:59 GMT, Hao Sun wrote: >> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixup: don't modify the value in vsrc >> >> Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this >> change, the result of recursive folding is held in vtmp1. To be able to >> pass this intermediate result to reduce_mul_integral_le128b(), we would >> have to use another temporary FloatRegister, as vtmp1 would essentially >> act as vsrc. It's possible to get around this however: >> reduce_mul_integral_le128b() is modified so it's possible to pass >> matching vsrc and vtmp2 arguments. By doing this, we save ourselves a >> temporary register in rules that match to reduce_mul_integral_gt128b(). >> - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3033: > >> 3031: format %{ "reduce_mulI_gt128b $dst, $isrc, $vsrc\t# vector (> 128 bits). KILL $tmp1, $tmp2, $pgtmp" %} >> 3032: ins_encode %{ >> 3033: BasicType bt = Matcher::vector_element_basic_type(this, $vsrc); > > I suggest adding `assert(UseSVE > 0, "must be sve");` assertion here and for the other three `*_gt128b` rules. Done, thanks. > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3043: > >> 3041: %} >> 3042: >> 3043: instruct reduce_mulL_le128b(iRegLNoSp dst, iRegL isrc, vReg vsrc) %{ > > I suggest using `_128b` here since only `2L` is matched here. > > Suggestion: > > instruct reduce_mulL_128b(iRegLNoSp dst, iRegL isrc, vReg vsrc) %{ Done, thank you! > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3055: > >> 3053: %} >> 3054: >> 3055: instruct reduce_mulL_gt128b(iRegLNoSp dst, iRegL isrc, vReg vsrc, vReg tmp1, > > nit: only one tmp vReg is used here > > Suggestion: > > instruct reduce_mulL_gt128b(iRegLNoSp dst, iRegL isrc, vReg vsrc, vReg tmp, Done, thanks! > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3099: > >> 3097: %} >> 3098: >> 3099: instruct reduce_mulD_le128b(vRegD dst, vRegD dsrc, vReg vsrc, vReg tmp) %{ > > Similar to `long` type, I suggest using `_128b` as only `2D` is matched here. Done! > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3: > >> 1: /* >> 2: * Copyright (c) 1997, 2025, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2014, 2025, Red Hat Inc. All rights reserved. > > nit: I don't think the copyright year for Red Hat needs to be updated Done, thanks. > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3677: > >> 3675: INSN(sve_nots, sve_eors); // Bitwise invert predicate, setting the condition flags; an alias of sve_eors >> 3676: #undef INSN >> 3677: > > These instructions are not used any more after the follow-up commit of using `EXT`. I suggest removing them. > > Besides, could you also share the benchmark data after using `EXT`? I don't have >=256-bit SVE on hand and cannot test that. Thanks. Done, thanks! > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2002: > >> 2000: assert(vector_length_in_bytes == 8 || vector_length_in_bytes == 16, "unsupported"); >> 2001: assert_different_registers(vtmp1, vsrc); >> 2002: assert_different_registers(vtmp1, vtmp2); > > nit: would be neat to use > Suggestion: > > assert_different_registers(vsrc, vtmp1, vtmp2); `vsrc` and `vtmp2` are allowed to match. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2087: > >> 2085: assert(vector_length_in_bytes > FloatRegister::neon_vl, "ASIMD impl should be used instead"); >> 2086: assert(vector_length_in_bytes <= FloatRegister::sve_vl_max, "unsupported vector length"); >> 2087: assert(is_power_of_2(vector_length_in_bytes), "unsupported vector length"); > > Better to compare with `MaxVectorSize`. > > I suggest using `assert(length_in_bytes == MaxVectorSize, "invalid vector length");` and putting this assertion in `aarch64_vector.ad` file, i.e. inside the matching rule. Why is it better that way? Currently the assertions check that we end up here if there computations that can be done only using SVE (length > neon && length <= sve). What would happen if a user operates 256b VectorAPI vectors on a 512b SVE platform? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174893039 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2175064198 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2175062794 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2175063706 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174918739 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174926322 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2154442438 PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174917565 From mablakatov at openjdk.org Mon Jun 30 13:25:11 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 13:25:11 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v4] In-Reply-To: References: <2jvFY4hq9FPdk9e4Zg6LRPdRVhDTGgxofL-we8c-mns=.4e6ce509-67a4-4e46-a661-2b0951f88731@github.com> Message-ID: On Thu, 27 Feb 2025 03:49:41 GMT, Hao Sun wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2139: >> >>> 2137: // source vector to get to a 128b vector that fits into a SIMD&FP register. After that point ASIMD >>> 2138: // instructions are used. >>> 2139: void C2_MacroAssembler::reduce_mul_fp_gt128b(FloatRegister dst, BasicType bt, FloatRegister fsrc, >> >> Drive-by question: >> This is recursive folding: take halve the vector and add it that way. >> >> What about the linear reduction, is that also implemented somewhere? We need that for vector reduction when we come from SuperWord, and have strict order requirement, to avoid rounding divergences. > > I have the same concern about the order issue with @eme64. > Should we only enable this only for VectorAPI case, which doesn't require strict-order? FP reductions have been disabled for auto-vectorization, please see the following comment: https://github.com/openjdk/jdk/pull/23181/files#diff-edf6d70f65d81dc12a483088e0610f4e059bd40697f242aedfed5c2da7475f1aR130 . You can also check https://github.com/openjdk/jdk/pull/23181#issuecomment-3018988067 to see how the patch affects auto-vectorization performance. The only benchmarks that saw a performance uplift on a 256b SVE platform is `VectorReduction2.WithSuperword.intMulBig` (which is fine since it's an integer benchmark). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2174943784 From epeter at openjdk.org Mon Jun 30 13:54:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 30 Jun 2025 13:54:45 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v2] In-Reply-To: <3cLLB7fms3S4WgqOVeb7D_ZDRFsJ_-ca3qfALlmzFeU=.1002ac91-1e35-4499-9d88-6d1f76c955d0@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> <3cLLB7fms3S4WgqOVeb7D_ZDRFsJ_-ca3qfALlmzFeU=.1002ac91-1e35-4499-9d88-6d1f76c955d0@github.com> Message-ID: <0MJe_8nA-ILWqoVG-9rzuq5Pe9xX-FG2LN3k9Cy8nqU=.d724c6cf-cb02-45c4-95a4-5bd1fef7462b@github.com> On Mon, 30 Jun 2025 12:31:29 GMT, Beno?t Maillard wrote: >> This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. >> >> ### Context >> During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. >> >> In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). >> >> ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) >> >> ### Detailed Analysis >> >> In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which >> results in a type refinement: the range gets restricted to `int:-13957..-1191`. >> >> ```c++ >> // Pull from worklist; compute new value; push changes out. >> // This loop is the meat of CCP. >> while (worklist.size() != 0) { >> Node* n = fetch_next_node(worklist); >> DEBUG_ONLY(worklist_verify.push(n);) >> if (n->is_SafePoint()) { >> // Make sure safepoints are processed by PhaseCCP::transform even if they are >> // not reachable from the bottom. Otherwise, infinite loops would be removed. >> _root_and_safepoints.push(n); >> } >> const Type* new_type = n->Value(this); >> if (new_type != type(n)) { >> DEBUG_ONLY(verify_type(n, new_type, type(n));) >> dump_type_and_node(n, new_type); >> set_type(n, new_type); >> push_child_nodes_to_worklist(worklist, n); >> } >> if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { >> // Keep track of Type nodes to kill CFG paths that use Type >> // nodes that become dead. >> _maybe_top_type_nodes.push(n); >> } >> } >> DEBUG_ONLY(verify_analyze(worklist_verify);) >> >> >> At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: >> - `int` for node `591` (`ModINode`) >> - `int:-13957..-1191` for node `138` (`PhiNode`) >> >> If we call `find_node(138)->bottom_type()`, we get: >> - `int` for both nodes >> >> The... > > Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: > > - 8359602: rename test > - 8359602: remove requires.debug=true and add -XX:+IgnoreUnrecognizedVMOptions flag @benoitmaillard Very nice work, and great description :) >Did you check if this allows enabling any of the other disabled verifications from [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273)? That may be a lot of work. Not sure if it is worth checking all of them now. @TobiHartmann how much should he invest in this now? An alternative is just tackling all the other cases later. What do you think? @benoitmaillard One more open question for me: `raise_bottom_type` only sets the node internal `_type`. But in IGVN, we do not read from `_type` but `phase->type(in(2))`. Do you know when the `phase->type(in(2))` value changes? Is that also during CCP? Before or after the `_type` is modified? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26017#issuecomment-3019253499 From mablakatov at openjdk.org Mon Jun 30 15:29:49 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Jun 2025 15:29:49 GMT Subject: RFR: 8359963: compiler/c2/aarch64/TestStaticCallStub.java fails with for code cache > 250MB the static call stub is expected to be implemented using far branch Message-ID: The test assumed that hsdis is always available which is not the case. Make the test accept and scan either real or pseudo disassembly. ------------- Commit messages: - 8359963: compiler/c2/aarch64/TestStaticCallStub.java fails with for code cache > 250MB the static call stub is expected to be implemented using far branch Changes: https://git.openjdk.org/jdk/pull/26047/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26047&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8359963 Stats: 265 lines in 2 files changed: 215 ins; 24 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/26047.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26047/head:pull/26047 PR: https://git.openjdk.org/jdk/pull/26047 From missa at openjdk.org Mon Jun 30 15:34:39 2025 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 30 Jun 2025 15:34:39 GMT Subject: RFR: 8358179: Performance regression in Math.cbrt [v2] In-Reply-To: References: <45l5EvxoRINI1_Ep2_snJzKNMPo4-dPXADalLN1fq1Y=.9f697a35-ee7b-4e7a-9e5e-ff33911b3b21@github.com> Message-ID: On Mon, 30 Jun 2025 09:05:01 GMT, Andrew Haley wrote: > > The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470. > > Please add the performance for arguments in the normal range to this list. Sure, I added a line covering this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25962#issuecomment-3019645253 From mhaessig at openjdk.org Mon Jun 30 15:41:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 30 Jun 2025 15:41:13 GMT Subject: RFR: 8361092: Remove trailing spaces in x86 ad files Message-ID: This PR fixes some trailing spaces in `x86_64.ad`. Testing: - [ ] Github Actions ------------- Commit messages: - Fix trailing spaces Changes: https://git.openjdk.org/jdk/pull/26048/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26048&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361092 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26048/head:pull/26048 PR: https://git.openjdk.org/jdk/pull/26048 From eastigeevich at openjdk.org Mon Jun 30 15:41:43 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 30 Jun 2025 15:41:43 GMT Subject: RFR: 8358183: [JVMCI] crash accessing nmethod::jvmci_name in CodeCache::aggregate In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 06:39:18 GMT, Boris Ulasevich wrote: > This change addresses an intermittent crash in CompileBroker::print_heapinfo() when accessing JVMCI metadata after a CodeBlob::purge(). > > The issue is a regression after: > - JDK-8343789: JVMCI metadata was moved from nmethod into a separate blob. > - JDK-8352112: CodeBlob::purge() was updated to set _mutable_data to blob_end(). > > The change zeroes out _mutable_data_size, _relocation_size, and _metadata_size in purge() so that after purge jvmci_data_size() returns 0 and CompileBroker::print_heapinfo() won?t touch an invalid _metadata. lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/25608#pullrequestreview-2971750695 From bmaillard at openjdk.org Mon Jun 30 15:42:01 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 30 Jun 2025 15:42:01 GMT Subject: RFR: 8359602: Ideal optimizations depending on input type are missed because of missing notification mechanism from CCP [v3] In-Reply-To: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> References: <-_MqCH6QmE-o_d7c9-aet-Cq-ptZJ6CZV6rodpDNWq0=.173e6f7a-3cfe-4791-8253-36e06d892069@github.com> Message-ID: > This PR prevents some missed ideal optimizations in IGVN by notifying users of type refinements made during CCP, addressing a missed optimization that caused a verification failure with `-XX:VerifyIterativeGVN=1110`. > > ### Context > During the compilation of the input program (obtained from the fuzzer, then simplified and added as a test) by C2, we end up with node `591 ModI` that takes `138 Phi` as its divisor input. An existing `Ideal` optimization is to get rid of the control input of a `ModINode` when we can prove that the divisor is never `0`. > > In this specific case, the type of the `PhiNode` gets refined during CCP, but the refinement fails to propagate to its users for the IGVN phase and the ideal optimization for the `ModINode` never happens. This results in a missed optimization and hits an assert in the verification phase of IGVN (when using `-XX:VerifyIterativeGVN=1110`). > > ![IGV screenshot](https://github.com/user-attachments/assets/5dee1ae6-9146-4115-922d-df33b7ccbd37) > > ### Detailed Analysis > > In `PhaseCCP::analyze`, we call `Value` for the `PhiNode`, which > results in a type refinement: the range gets restricted to `int:-13957..-1191`. > > ```c++ > // Pull from worklist; compute new value; push changes out. > // This loop is the meat of CCP. > while (worklist.size() != 0) { > Node* n = fetch_next_node(worklist); > DEBUG_ONLY(worklist_verify.push(n);) > if (n->is_SafePoint()) { > // Make sure safepoints are processed by PhaseCCP::transform even if they are > // not reachable from the bottom. Otherwise, infinite loops would be removed. > _root_and_safepoints.push(n); > } > const Type* new_type = n->Value(this); > if (new_type != type(n)) { > DEBUG_ONLY(verify_type(n, new_type, type(n));) > dump_type_and_node(n, new_type); > set_type(n, new_type); > push_child_nodes_to_worklist(worklist, n); > } > if (KillPathsReachableByDeadTypeNode && n->is_Type() && new_type == Type::TOP) { > // Keep track of Type nodes to kill CFG paths that use Type > // nodes that become dead. > _maybe_top_type_nodes.push(n); > } > } > DEBUG_ONLY(verify_analyze(worklist_verify);) > > > At the end of `PhaseCCP::analyze`, we obtain the following types in the side table: > - `int` for node `591` (`ModINode`) > - `int:-13957..-1191` for node `138` (`PhiNode`) > > If we call `find_node(138)->bottom_type()`, we get: > - `int` for both nodes > > There is no progress on the type of `ModINode` during CCP, because `ModINode::Value` > is not able to... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Fix bad test class name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26017/files - new: https://git.openjdk.org/jdk/pull/26017/files/e05eb749..75de51df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26017&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26017/head:pull/26017 PR: https://git.openjdk.org/jdk/pull/26017 From kbarrett at openjdk.org Mon Jun 30 16:26:01 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Jun 2025 16:26:01 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string Message-ID: Please review this trivial fix of a format string. The value being printed is TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". Testing: mach5 tier1 ------------- Commit messages: - TieredStopAtLevel printing Changes: https://git.openjdk.org/jdk/pull/26051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26051&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361086 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26051/head:pull/26051 PR: https://git.openjdk.org/jdk/pull/26051 From kvn at openjdk.org Mon Jun 30 16:28:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Jun 2025 16:28:43 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: On Tue, 24 Jun 2025 16:32:11 GMT, Saranya Natarajan wrote: >> This changeset restructures the macro expansion phase to not include macro elimination and also adds a flag StressMacroElimination which randomizes macro elimination ordering for stress testing purposes. >> >> Changes: >> - Implemented a method `eliminate_opaque_looplimit_macro_nodes` that removes the functionality for eliminating Opaque and LoopLimit nodes from the `expand_macro_nodes ` method. >> - Introduced compiler phases` PHASE_AFTER_MACRO_ELIMINATION` >> - Added a new Ideal phase for individual macro elimination steps. >> - Implemented the flag `StressMacroElimination`. Added functionality tests for `StressMacroElimination`, similar to previous stress flag `StressMacroExpansion` ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)). >> >> Below is a sample screenshot (IGV print level 4 ) mainly showing the new phase . >> ![image](https://github.com/user-attachments/assets/16013cd4-6ec6-4939-ac66-33bb03d59cd6) >> >> Questions to reviewers: >> - Is the new macro elimination phase OK, or should we change anything? >> - In `compile.cpp `, `PHASE_ITER_GVN_AFTER_ELIMINATION` follows `PHASE_AFTER_MACRO_ELIMINATION` in the current fix. Should `PHASE_ITER_GVN_AFTER_ELIMINATION` be removed ? >> >> Testing: >> GitHub Actions >> tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> Tested that thousands of graphs are correctly opened and visualized with IGV using the same test used in ([JDK-8317349](https://bugs.openjdk.org/browse/JDK-8317349)) > > Saranya Natarajan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > merge with master > Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 I have few comments. src/hotspot/share/opto/compile.cpp line 2533: > 2531: { > 2532: TracePhase tp(_t_macroExpand); > 2533: print_method(PHASE_BEFORE_MACRO_EXPANSION, 3); Should we move it before `mex.expand_macro_nodes()` call? src/hotspot/share/opto/phasetype.hpp line 94: > 92: flags(AFTER_LOOP_OPTS, "After Loop Optimizations") \ > 93: flags(AFTER_MERGE_STORES, "After Merge Stores") \ > 94: flags(AFTER_MACRO_ELIMINATION_STEP, "After Macro Elimination Step") \ What is the reason to not have `BEFORE_MACRO_ELIMINATION`? src/utils/IdealGraphVisualizer/README.md line 36: > 34: * `N=3`: additionally, after every minor phase > 35: * `N=4`: additionally, after every loop optimization > 36: * `N=5`: additionally, after every effective IGVN, macro elimination, and expansion step (slow) Typo `,` before `, and`. test/hotspot/jtreg/compiler/arguments/TestStressOptions.java line 59: > 57: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+StressUnstableIfTraps -XX:StressSeed=42 > 58: * compiler.arguments.TestStressOptions > 59: * * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+StressMacroElimination typo - double stars`* *` ------------- PR Review: https://git.openjdk.org/jdk/pull/25682#pullrequestreview-2971878935 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2175466142 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2175462641 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2175450582 PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2175448407 From dlunden at openjdk.org Mon Jun 30 16:46:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 30 Jun 2025 16:46:41 GMT Subject: RFR: 8325478: Restructure the macro expansion compiler phase to not include macro elimination [v7] In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 16:15:18 GMT, Vladimir Kozlov wrote: >> Saranya Natarajan has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> merge with master >> Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8325478 > > src/utils/IdealGraphVisualizer/README.md line 36: > >> 34: * `N=3`: additionally, after every minor phase >> 35: * `N=4`: additionally, after every loop optimization >> 36: * `N=5`: additionally, after every effective IGVN, macro elimination, and expansion step (slow) > > Typo `,` before `, and`. @sarannat: I also think you should write this out in full: `... macro elimination, and macro expansion step (slow)`. The `,` before `and` in itself is not incorrect, but a question of style (see [Oxford comma](https://en.wikipedia.org/wiki/Serial_comma)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25682#discussion_r2175500831 From kvn at openjdk.org Mon Jun 30 17:23:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Jun 2025 17:23:45 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 16:14:08 GMT, Kim Barrett wrote: > Please review this trivial fix of a format string. The value being printed is > TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". > > Testing: mach5 tier1 Or we can cast it to (int). Or we can change flag's declaration to `int` type. ------------- PR Review: https://git.openjdk.org/jdk/pull/26051#pullrequestreview-2972076516 From kvn at openjdk.org Mon Jun 30 17:47:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Jun 2025 17:47:38 GMT Subject: RFR: 8361092: Remove trailing spaces in x86 ad files In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 15:34:18 GMT, Manuel H?ssig wrote: > This PR fixes some trailing spaces in `x86_64.ad`. > > Testing: > - [ ] Github Actions Trivial ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26048#pullrequestreview-2972139107 From sviswanathan at openjdk.org Mon Jun 30 18:17:37 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Jun 2025 18:17:37 GMT Subject: RFR: 8361092: Remove trailing spaces in x86 ad files In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 15:34:18 GMT, Manuel H?ssig wrote: > This PR fixes some trailing spaces in `x86_64.ad`. > > Testing: > - [ ] Github Actions Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26048#pullrequestreview-2972212799 From asmehra at openjdk.org Mon Jun 30 19:50:21 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 30 Jun 2025 19:50:21 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly Message-ID: Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial ------------- Commit messages: - 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly Changes: https://git.openjdk.org/jdk/pull/26053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361101 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26053/head:pull/26053 PR: https://git.openjdk.org/jdk/pull/26053 From aturbanov at openjdk.org Mon Jun 30 19:51:49 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 30 Jun 2025 19:51:49 GMT Subject: RFR: 8360641: TestCompilerCounts fails after 8354727 [v3] In-Reply-To: References: <3mMrDF_446r7HudbsHIpdoWByBlnUpjFo7YzIty0KG8=.facc058f-3975-44c4-b2d4-93b8c64db185@github.com> Message-ID: On Mon, 30 Jun 2025 08:58:07 GMT, Manuel H?ssig wrote: >> After integrating #25872 the calculation of the`CICompilerCount` ergonomic became dependent on the size of `NonNMethodCodeHeapSize`, which itself is an ergonomic based on the available memory. Thus, depending on the system, the test `compiler/arguments/TestCompilerCounts.java` failed, i.e. locally this failed, but not on CI servers. >> >> This PR changes the test to reflect the changes introduced in #25872. >> >> Testing: >> - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15932906313) >> - [x] tier1,tier2 plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous newline > - Add copyright test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 159: > 157: // Tiered modes > 158: int tieredCount = heuristicCount(cpus, Compilation.Tiered, debug); > 159: pass(tieredCount, opt, "-XX:NonNMethodCodeHeapSize=" + NonNMethodCodeHeapSize); Suggestion: pass(tieredCount, opt, "-XX:NonNMethodCodeHeapSize=" + NonNMethodCodeHeapSize); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26024#discussion_r2175779955 From asmehra at openjdk.org Mon Jun 30 20:26:38 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 30 Jun 2025 20:26:38 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 19:45:49 GMT, Ashutosh Mehra wrote: > Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial @vnkozlov can you please review ------------- PR Comment: https://git.openjdk.org/jdk/pull/26053#issuecomment-3020578453 From kvn at openjdk.org Mon Jun 30 21:06:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Jun 2025 21:06:38 GMT Subject: RFR: 8361101: AOTCodeAddressTable::_stubs_addr not initialized/freed properly In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 19:45:49 GMT, Ashutosh Mehra wrote: > Please reivew this patch to fix initialization and freeing of `AOTCodeAddressTable::_stubs_addr`. Changes are trivial Looks fine but please wait with integration - I am working on merge from mainline. ------------- PR Review: https://git.openjdk.org/jdk/pull/26053#pullrequestreview-2972721545 From kbarrett at openjdk.org Mon Jun 30 22:50:40 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Jun 2025 22:50:40 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 17:21:20 GMT, Vladimir Kozlov wrote: > Or we can cast it to (int). Or we can change flag's declaration to `int` type. I think it's better to change the format string than to cast the value. If we later change the value type we'll (hopefully by then) get an appropriate warning about needing to change the format string again. Adding a cast now becomes a useless or otherwise confusing cast (that is hard to find, because that's the nature of casts) after a change of type. I tried changing the type of the option to int and that got messy, because there are a number of places that are accessing it by name from Java as an intx, such as using WhiteBox::getIntxVMFlag. I wasn't entirely confident I'd found all of them, since I was getting into unfamiliar code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26051#issuecomment-3021070917 From kvn at openjdk.org Mon Jun 30 23:18:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Jun 2025 23:18:38 GMT Subject: RFR: 8361086: JVMCIGlobals::check_jvmci_flags_are_consistent has incorrect format string In-Reply-To: References: Message-ID: On Mon, 30 Jun 2025 16:14:08 GMT, Kim Barrett wrote: > Please review this trivial fix of a format string. The value being printed is > TieredStopAtLevel, which is of type intx, so "%zd" should be used instead of "%d". > > Testing: mach5 tier1 Thank you for checking other solutions. Current fix is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26051#pullrequestreview-2973046167